Tutorial Hadoop multi node installation

05 Dec

Tutorial Hadoop multi node installation

In this tutorial we will see how to install a Hadoop multi node nodes by doing the following steps:

step1: You need to ensure that hadoop is installed (single node mode) on all pcs (masters+ slaves). If not use this link to do it

step 2: Now let’s configure the pcs to work on multinode cluster, as an example we will use two slaves and one master:

Add all host names to /etc/hosts directory in all Machines (Master and Slave nodes):

    sudo gedit /etc/hosts

1	sudo gedit /etc/hosts

# Add following hostname and their ip in host table
Suppose that your slaves address are : 192.168.0.151 and 192.168.0.152. And the master address is: 192.168.0.150

    192.168.0.150    HadoopMaster
    192.168.0.151    HadoopSlave1
    192.168.0.152    HadoopSlave2

192.168.0.150 HadoopMaster

192.168.0.151 HadoopSlave1

192.168.0.152 HadoopSlave2

    sudo apt-get install rsync
    sudo reboot

1 2	sudo apt-get install rsync sudo reboot

Now let’s do the common configuration in all nodes (slaves + master):

1- ## edit core-site.xml:

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit core-site.xml

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit core-site.xml

## Paste these lines into <configuration> tag OR Just update it by replacing localhost with master

    <property>
        <name>fs.default.name</name>
        <value>hdfs://HadoopMaster:9000</value>
    </property>

<name>fs.default.name</name>

<value>hdfs://HadoopMaster:9000</value>

</property>

2- Update hdfs-site.xml
Update this file by updating repliction factor from 1 to 3.

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit hdfs-site.xml

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit hdfs-site.xml

Then paste/update these lines into <configuration> tag

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>

<name>dfs.replication</name>

</property>

3- Update yarn-site.xml
Update this file by updating the following three properties by updating hostname from localhost to HadoopMaster,

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit yarn-site.xml

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit yarn-site.xml

Then paste/update these lines into <configuration> tag

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>HadoopMaster:8025</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>HadoopMaster:8035</value>
    </property>

    <property>
        <name>yarn.resourcemanager.address</name>
        <value>HadoopMaster:8050</value>
    </property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>HadoopMaster:8025</value>

</property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>HadoopMaster:8035</value>

</property>

<name>yarn.resourcemanager.address</name>

<value>HadoopMaster:8050</value>

</property>

4- Update Mapred-site.xml
Update this file by updating and adding following properties,

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit mapred-site.xml

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit mapred-site.xml

Then paste/update these lines into <configuration> tag

    <property>
        <name>mapreduce.job.tracker</name>
        <value>HadoopMaster:5431</value>
    </property>

    <property>
        <name>mapred.framework.name</name>
        <value>yarn</value>
    </property>

<name>mapreduce.job.tracker</name>

<value>HadoopMaster:5431</value>

</property>

<name>mapred.framework.name</name>

</property>

5- Update masters file

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit masters

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit masters

Then add name of master nodes

    HadoopMaster

1	HadoopMaster

6- Update slaves

    cd /usr/local/hadoop/etc/hadoop
    sudo gedit slaves

1 2	cd /usr/local/hadoop/etc/hadoop sudo gedit slaves

Thenadd name of slave nodes

    HadoopSlave1
    HadoopSlave2

1 2	HadoopSlave1 HadoopSlave2

Applying Master node specific Hadoop configuration: (Only for master nodes)

1- Remove existing Hadoop_data folder (which was created while single node hadoop setup.)

    sudo rm -rf /usr/local/hadoop_tmp/

1	sudo rm -rf /usr/local/hadoop_tmp/

2- Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode directory (/usr/local/hadoop_tmp/hdfs/namenode)

    sudo mkdir -p /usr/local/hadoop_tmp/
    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode

1 2	sudo mkdir -p /usr/local/hadoop_tmp/ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode

3- Make hduser as owner of that directory.

    sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

1	sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

Applying Slave node specific Hadoop configuration : (Only for slave nodes)

1- Remove existing Hadoop_data folder (which was created while single node hadoop setup)

    sudo rm -rf /usr/local/hadoop_tmp/hdfs/

1	sudo rm -rf /usr/local/hadoop_tmp/hdfs/

2- Creates same (/usr/local/hadoop_tmp/) directory/folder and inside this folder again Create DataNode (/usr/local/hadoop_tmp/hdfs/namenode) directory folder

    sudo mkdir -p /usr/local/hadoop_tmp/
    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

1 2	sudo mkdir -p /usr/local/hadoop_tmp/ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

3- Make hduser as owner of that directory

    sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

1	sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

SSH Configuration:

On the master node fire the following command for sharing public SSH key ~/.ssh/id_rsa.pub file (of HadoopMaster node) to authorized_keys file of hduser@HadoopSlave1 and also on hduser@HadoopSlave1 (in $HOME/.ssh/authorized_keys)

    cd
    ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave1
    ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave2

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave1

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave2

Now let’s format the namenode (Run on MasterNode) :

# Run this command from Masternode

    hdfs namenode -format

1	hdfs namenode -format

Starting up Hadoop cluster daemons : (Run on MasterNode)
Start HDFS daemons:

    start-dfs.sh

1	start-dfs.sh

Start Yarn daemons:

    start-yarn.sh

1	start-yarn.sh

Track/Monitor/Verify Hadoop cluster : (Run on any Node)
Verify Hadoop daemons on Master :

jps

jps

you should see only

    jps
    namenode
    secondary namenode
    resource manager

jps

namenode

secondary namenode

resource manager

Verify Hadoop daemons on all slave nodes :

jps

jps

    jps    
    NodeManager
    DataNode

jps

NodeManager

DataNode

to view on web:

For ResourceManager –Http://HadoopMaster:8088

For NameNode – Http://HadoopMaster:50070

execute word count example:

in the masternode (always, clients should only talk to masternode ):

    hdfs dfs -mkdir /test/
    hdfs dfs -copyFromLocal wordCountinput.txt /test/
    hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /test/wordCountinput.txt /test/output/

hdfs dfs -mkdir /test/

hdfs dfs -copyFromLocal wordCountinput.txt /test/

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /test/wordCountinput.txt /test/output/

Author: Nizar Ellouze

Share5

5 Shares

Tags:

apache hadoop,hadoop cluster,hadoop multi node installation,map reduce

4 Comments

cialis 20 mg best price

Posted at 7:04 am, April 30, 2018

Fine stuff. Thanks a lot!
LelandSax

Posted at 7:04 am, July 12, 2018

This site was… how do you say it? Relevant!! Finally I have found something that helped me. Kudos!
AlinaAnatUncet

Posted at 10:55 pm, September 4, 2018

Excellent!
Respect the author!
And I read the actual news here …[url=https://technology4you.website/category/smartphone-reviews/]Smartphone Reviews[/url]
Soumeya

Posted at 7:48 am, October 3, 2018

Hello,
Thank you for your tutorial very useful

My question is about installing Hadoop on Windows 7, must i follow the same steps or is there any changes?
How can i configure ssh on Windows 7?

Best

Tutorial Hadoop multi node installation