
Tutorial Hadoop multi node installation
In this tutorial we will see how to install a Hadoop multi node nodes by doing the following steps:
step1: You need to ensure that hadoop is installed (single node mode) on all pcs (masters+ slaves). If not use this link to do it
step 2: Now let’s configure the pcs to work on multinode cluster, as an example we will use two slaves and one master:
Add all host names to /etc/hosts directory in all Machines (Master and Slave nodes):
1 |
sudo gedit /etc/hosts |
# Add following hostname and their ip in host table
Suppose that your slaves address are : 192.168.0.151 and 192.168.0.152. And the master address is: 192.168.0.150
1 2 3 |
192.168.0.150 HadoopMaster 192.168.0.151 HadoopSlave1 192.168.0.152 HadoopSlave2 |
login with hduser, then install rsync and reboot the pc
1 2 |
sudo apt-get install rsync sudo reboot |
Now let’s do the common configuration in all nodes (slaves + master):
1- ## edit core-site.xml:
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit core-site.xml |
## Paste these lines into <configuration> tag OR Just update it by replacing localhost with master
1 2 3 4 |
<property> <name>fs.default.name</name> <value>hdfs://HadoopMaster:9000</value> </property> |
2- Update hdfs-site.xml
Update this file by updating repliction factor from 1 to 3.
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit hdfs-site.xml |
Then paste/update these lines into <configuration> tag
1 2 3 4 |
<property> <name>dfs.replication</name> <value>3</value> </property> |
3- Update yarn-site.xml
Update this file by updating the following three properties by updating hostname from localhost to HadoopMaster,
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit yarn-site.xml |
Then paste/update these lines into <configuration> tag
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>HadoopMaster:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>HadoopMaster:8035</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>HadoopMaster:8050</value> </property> |
4- Update Mapred-site.xml
Update this file by updating and adding following properties,
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit mapred-site.xml |
Then paste/update these lines into <configuration> tag
1 2 3 4 5 6 7 8 9 |
<property> <name>mapreduce.job.tracker</name> <value>HadoopMaster:5431</value> </property> <property> <name>mapred.framework.name</name> <value>yarn</value> </property> |
5- Update masters file
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit masters |
Then add name of master nodes
1 |
HadoopMaster |
6- Update slaves
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit slaves |
Thenadd name of slave nodes
1 2 |
HadoopSlave1 HadoopSlave2 |
Applying Master node specific Hadoop configuration: (Only for master nodes)
1- Remove existing Hadoop_data folder (which was created while single node hadoop setup.)
1 |
sudo rm -rf /usr/local/hadoop_tmp/ |
2- Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode directory (/usr/local/hadoop_tmp/hdfs/namenode)
1 2 |
sudo mkdir -p /usr/local/hadoop_tmp/ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode |
3- Make hduser as owner of that directory.
1 |
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/ |
Applying Slave node specific Hadoop configuration : (Only for slave nodes)
1- Remove existing Hadoop_data folder (which was created while single node hadoop setup)
1 |
sudo rm -rf /usr/local/hadoop_tmp/hdfs/ |
2- Creates same (/usr/local/hadoop_tmp/) directory/folder and inside this folder again Create DataNode (/usr/local/hadoop_tmp/hdfs/namenode) directory folder
1 2 |
sudo mkdir -p /usr/local/hadoop_tmp/ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode |
3- Make hduser as owner of that directory
1 |
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/ |
SSH Configuration:
On the master node fire the following command for sharing public SSH key ~/.ssh/id_rsa.pub file (of HadoopMaster node) to authorized_keys file of hduser@HadoopSlave1 and also on hduser@HadoopSlave1 (in $HOME/.ssh/authorized_keys)
1 2 3 |
cd ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave1 ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave2 |
Now let’s format the namenode (Run on MasterNode) :
# Run this command from Masternode
1 |
hdfs namenode -format |
Starting up Hadoop cluster daemons : (Run on MasterNode)
Start HDFS daemons:
1 |
start-dfs.sh |
Start Yarn daemons:
1 |
start-yarn.sh |
Track/Monitor/Verify Hadoop cluster : (Run on any Node)
Verify Hadoop daemons on Master :
1 |
jps |
you should see only
1 2 3 4 |
jps namenode secondary namenode resource manager |
Verify Hadoop daemons on all slave nodes :
1 |
jps |
1 2 3 |
jps NodeManager DataNode |
to view on web:
For ResourceManager –Http://HadoopMaster:8088
For NameNode – Http://HadoopMaster:50070
execute word count example:
in the masternode (always, clients should only talk to masternode ):
1 2 3 |
hdfs dfs -mkdir /test/ hdfs dfs -copyFromLocal wordCountinput.txt /test/ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /test/wordCountinput.txt /test/output/ |
cialis 20 mg best price
Fine stuff. Thanks a lot!
LelandSax
This site was… how do you say it? Relevant!! Finally I have found something that helped me. Kudos!
AlinaAnatUncet
Excellent!
Respect the author!
And I read the actual news here …[url=https://technology4you.website/category/smartphone-reviews/]Smartphone Reviews[/url]
Soumeya
Hello,
Thank you for your tutorial very useful
My question is about installing Hadoop on Windows 7, must i follow the same steps or is there any changes?
How can i configure ssh on Windows 7?
Best