Integrating LVM volumes with Hadoop and AWS to provide elasticity.

4 min readJan 1, 2021

Hadoop: Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. It’s at the center of an ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning.

Prerequisites: Read this blog first for understanding of LVM.

Creating LVM volumes in RHEL 8.

Hello Readers, In today’s blog I am going to show you how you can create LVM volumes.

abhiwritestech.medium.com

Integrating LVM with Hadoop

Now to use Hadoop we need a minimum of two VM’s i.e. one for Namenode and another for Datanode. So I am launching one more VM for the setting up Namenode. As my current setup is not capable of launching two VM’s at once So, I am launching One of my instance at AWS.

Now I have access to two instances, to setup Hadoop we have a package requirement for Java and Hadoop. Download Java and Hadoop from the below link to your instances. As I am using a specific version of Java and Hadoop it is hard to install it from default package manager like yum or apt.

Hadoop: https://softwareforarth.s3.ap-south-1.amazonaws.com/hadoop-1.2.1-1.x86_64.rpm

Java: https://softwareforarth.s3.ap-south-1.amazonaws.com/jdk-8u171-linux-x64.rpm

Install the above softwares with the command “rpm -i software — force”

similarly install both the software in both the virtual machines. After installing both the software , It’s time to setup Namenode and Datanode.

Setting Up Namenode.

To setup Namenode edit the /etc/hadoop/hdfs-site.xml and /etc/hadoop/core-site.xml file.

Use the following code for hdfs-site.xml and core-site.xml file.

We here using 0.0.0.0 as a IP for NameNode because by default AWS instances doesn’t know it’s public IP addresses. Here we using /namenode folder for setting up namenode directory.

Now to start Hadoop namenode first format the namenode by using “ hadoop namenode -format” command then run “hadoop-daemon.sh start namenode” command to start the hadoop services.

Use command “hadoop dfsadmin -report” to see detailed information about hadoop cluster.

As you can see that currently no datanode is configured, So it is showing us 0 capacity.

Setting Up Datanode.

To setup datanode edit the core-site.xml and hdfs-site.xml file.

Here we using same folder we mounted with LVM. So we can increase or decrease the size of datanode easily.

Here we are using public IP of our AWS instance on which namenode is setup. Now to start datanode we will use “hadoop-daemon.sh start datanode” command.

Start the datanode by using command “hadoop-daemon.sh start datanode”. Now our hadoop setup is complete we can check it through web UI, or by running “hadoop dfsadmin -report” command.

Now you guys can see currently only 12,24GB is configured. Now we have a requirement to increase the size of the datanode without shutting down the cluster. Can we achieve this??

As we configured our datanode on LVM volume, So yes we can, without stopping the contribution to cluster.

Let’s increase 3 GB to current capacity.