How to Setup Hadoop Cluster Using Ansible.

3 min readMar 23, 2021

Hello Readers, Today I am going to Show you that How can you setup a Hadoop Distributed Storage Cluster using Ansible Playbook. Before Starting all that, Let me introduce you to the key terms of this blog.

Source: https://thealchemist627.medium.com/

Ansible: It is a configuration management software developed by RedHat, widely use to configure the large amount of devices using a simple codes called playbooks. This playbooks are declarative in nature and written using YAML(Yet Another Markup Language) format.

Hadoop: It is an Open Source software by Apache used for distributed storage and distributed computing. This software is majorly used to handle the problem of BIG DATA, It works on the principle of master and slave where one node manages the other instances. The master node is known as Namenode and the workers are Slavenodes.

Now Let’s start creating our own Hadoop Cluster using Ansible.

Prerequisites:

Ansible should be installed in one of your system.
To install ansible run “pip3 install ansible”
There should be SSH connectivity within all the nodes.
SSH keys should be copy in all the nodes for password free login.
To copy SSH key use “ssh-copy-id -i key(default key = id_rsa.pub) IP”

Step 1: Download the given repo using git to your ansible node.

PythonBYabhi/playtosetupHadoopCluster

Here I am publishing an ansible playbook, which can be easily used to setup hadoop cluster Step 1:- Clone this repo in…

github.com

Step 2: Go to the downloaded repo using change directory command.

“cd playtosetupHadoopCluster”

Step 3: Edit the inventory.txt file inside it and enter the IP’s of node on which you want to setup the Hadoop Cluster.

Enter all the IP’s below the HaddopCluster group and IP of the node on which you want to setup namenode under the namenode group, remaing all the IP’s for datanode.

Step 4: Run ansible playbook “hadoop.yml” using the following command.

“ansible-playbook hadoop.yml”

Running the playbook.

Now you can see that your playbook has been launched and started configuring the Hadoop Cluster.

NOTE: If you are using playbook on cloud, then for namenode instance enter IP as 0.0.0.0 as cloud by default doesn’t know its own public IP. And when asked on Datanode Enter the Public IP of your cloud Instance.