Tutorial Hadoop multi node installation

Tutorial Hadoop multi node installation

In this tutorial we will see how to install a Hadoop multi node nodes by doing the following steps:

step1: You need to ensure that hadoop is installed (single node mode) on all pcs (masters+ slaves). If not use this link to do it

step 2: Now let’s configure the pcs to work on multinode cluster, as an example we will use two slaves and one master:

Add all host names to /etc/hosts directory in all Machines (Master and Slave nodes):

# Add following hostname and their ip in host table
Suppose that your slaves address are : 192.168.0.151 and 192.168.0.152. And the master address is: 192.168.0.150

login with hduser, then install rsync and reboot the pc

Now let’s do the common configuration in all nodes (slaves + master):

1- ## edit core-site.xml:

 

## Paste these lines into <configuration> tag OR Just update it by replacing localhost with master

2- Update hdfs-site.xml
Update this file by updating repliction factor from 1 to 3.

Then paste/update these lines into <configuration> tag

3- Update yarn-site.xml
Update this file by updating the following three properties by updating hostname from localhost to HadoopMaster,

Then paste/update these lines into <configuration> tag

4- Update Mapred-site.xml
Update this file by updating and adding following properties,

Then paste/update these lines into <configuration> tag

5- Update masters file

Then add name of master nodes

6- Update slaves

Thenadd name of slave nodes

Applying Master node specific Hadoop configuration: (Only for master nodes)

1- Remove existing Hadoop_data folder (which was created while single node hadoop setup.)

2- Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode directory (/usr/local/hadoop_tmp/hdfs/namenode)

3- Make hduser as owner of that directory.


Applying Slave node specific Hadoop configuration : (Only for slave nodes)

1- Remove existing Hadoop_data folder (which was created while single node hadoop setup)

2- Creates same (/usr/local/hadoop_tmp/) directory/folder and inside this folder again Create DataNode (/usr/local/hadoop_tmp/hdfs/namenode) directory folder

3- Make hduser as owner of that directory

SSH Configuration:

On the master node fire the following command for sharing public SSH key ~/.ssh/id_rsa.pub file (of HadoopMaster node) to authorized_keys file of hduser@HadoopSlave1 and also on hduser@HadoopSlave1 (in $HOME/.ssh/authorized_keys)

Now let’s format the namenode (Run on MasterNode) :

# Run this command from Masternode

Starting up Hadoop cluster daemons : (Run on MasterNode)
Start HDFS daemons:

Start Yarn daemons:

Instead both of these above command you can also use start-all.sh, but its now deprecated so its not recommended to be used for better Hadoop operations.

Track/Monitor/Verify Hadoop cluster : (Run on any Node)
Verify Hadoop daemons on Master :

you should see only

Verify Hadoop daemons on all slave nodes :

 

to view on web:

For ResourceManager – Http://HadoopMaster:8088

 

For NameNode – Http://HadoopMaster:50070

 

execute word count example:

in the masternode (always, clients should only talk to masternode ):

Author: Nizar Ellouze

No Comments

Post a Comment

Comment
Name
Email
Website