How to Setup Hadoop Cluster Using Ansible.

Hello Readers, Today I am going to Show you that How can you setup a Hadoop Distributed Storage Cluster using Ansible Playbook. Before Starting all that, Let me introduce you to the key terms of this blog.

Source: https://thealchemist627.medium.com/

Ansible: It is a configuration management software developed by RedHat, widely use to configure the large amount of devices using a simple codes called playbooks. This playbooks are declarative in nature and written using YAML(Yet Another Markup Language) format.

Hadoop: It is an Open Source software by Apache used for distributed storage and distributed computing. This software is majorly used to handle the problem of BIG DATA, It works on the principle of master and slave where one node manages the other instances. The master node is known as Namenode and the workers are Slavenodes.

Now Let’s start creating our own Hadoop Cluster using Ansible.

Prerequisites:

  • Ansible should be installed in one of your system.
  • To install ansible run “pip3 install ansible
  • There should be SSH connectivity within all the nodes.
  • SSH keys should be copy in all the nodes for password free login.
  • To copy SSH key use “ssh-copy-id -i key(default key = id_rsa.pub) IP”

Step 1: Download the given repo using git to your ansible node.

Step 2: Go to the downloaded repo using change directory command.

cd playtosetupHadoopCluster

Step 3: Edit the inventory.txt file inside it and enter the IP’s of node on which you want to setup the Hadoop Cluster.

Inventory.txt file

Enter all the IP’s below the HaddopCluster group and IP of the node on which you want to setup namenode under the namenode group, remaing all the IP’s for datanode.

Step 4: Run ansible playbook “hadoop.yml” using the following command.

ansible-playbook hadoop.yml

Running the playbook.

Now you can see that your playbook has been launched and started configuring the Hadoop Cluster.

Playbook Executing.

NOTE: If you are using playbook on cloud, then for namenode instance enter IP as 0.0.0.0 as cloud by default doesn’t know its own public IP. And when asked on Datanode Enter the Public IP of your cloud Instance.

Entering Public IP

After successful execution of Above Playbook, Check Your cluster by using command.

hadoop dfsadmin -report” or entering the IP of namenode in your browser using port number 50070.

Using CLI
Hadoop Dashboard

Thanks for Reading this blog, Hope you find it helpful…

If any queries I am available at Linkedin

--

--

--

Hey Readers, I am tech enthusiast and an Computer Science Student. Here I am sharing various industry use cases and its solution.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Lambda: An Anonymous function in Python

Renderchain/s and Path-Tracing Photons on the Blockchain

w8_embedded_mc3_1

Construct Week Project || Dermstore Clone by using React-Redux

And so it begins: The KeplerSwap Defi universe is becoming reality

Spiral Traverse | Rust

Open-Source vs Closed-Source

🚀Airdrop: Organstation NFT token Airdrop 💰Value: 10,000 $Cells 👥Referrals:2,000 $Cells ⏰End…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Sahu

Abhishek Sahu

Hey Readers, I am tech enthusiast and an Computer Science Student. Here I am sharing various industry use cases and its solution.

More from Medium

Installation of Kubernetes cluster with docker containers

Use Ansible to customize AWS EMR through bootstrap actions

Kafka install Ansible Playbook

CI/CD Pipeline for python microservices-based application using Jenkins and kubernetes