
Tutorial Hadoop single node installation
In this tutorial we will see how to install a Hadoop single node by doing the following steps
Step 1: Before installing Hadoop, you need to First ensure that java8 is installed:
1 2 3 |
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer |
Verify that java is correctly installed:
1 |
java -version |
Configuring Java Environment
1 |
sudo apt-get install oracle-java8-set-default |
Step 2: install hadoop single node mode
First add a hadoop user with admin access:
1 2 3 |
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo usermod -a -G sudo hduser |
then login with that user
Install SSH:
1 |
sudo apt-get install openssh-server |
Generate SSH Keys, so you don’t need to type password on each hadoop process startup:
1 |
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys |
Since Hadoop doesn’t work on IPv6, we should disable it.
1 |
sudo gedit /etc/sysctl.conf |
And add this lines at the end:
1 2 3 4 |
# disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 |
Download apache hadoop 2.6.0 :
1 2 3 4 5 |
cd Download wget https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz sudo tar -xzvf hadoop-2.6.0.tar.gz sudo mv hadoop-2.6.0 /usr/local/hadoop sudo chown hduser:hadoop -R /usr/local/hadoop |
Create Hadoop temp directories for Namenode and Datanode
1 2 3 |
sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/ |
Update bashrc
1 2 |
cd sudo gedit .bashrc |
And add this lines at the end:
1 2 3 4 5 6 7 8 9 10 11 12 |
# -- HADOOP ENVIRONMENT VARIABLES START -- # export JAVA_HOME=/usr/lib/jvm/java-8-oracle export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" # -- HADOOP ENVIRONMENT VARIABLES END -- # |
then execute bash:
1 2 |
cd . ~/.bashrc |
Now let’s configure Hadoop :
1 2 |
cd /usr/local/hadoop/etc/hadoop sudo gedit hadoop-env.sh |
## Update JAVA_HOME variable,
1 |
JAVA_HOME=/usr/lib/jvm/java-8-oracle |
1 |
sudo gedit core-site.xml |
## Paste these lines into <configuration> tag
1 2 3 4 |
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> |
1 |
sudo gedit hdfs-site.xml |
## Paste these lines into <configuration> tag
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value> </property> |
1 |
sudo gedit yarn-site.xml |
## Paste these lines into <configuration> tag
1 2 3 4 5 6 7 8 9 |
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> |
1 2 |
sudo cp mapred-site.xml.template mapred-site.xml sudo gedit mapred-site.xml |
## Paste these lines into <configuration> tag
1 2 3 4 |
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> |
Now let’s format the namenode
1 2 |
cd hdfs namenode -format |
Now let’s start hadoop process
1 2 |
start-dfs.sh start-yarn.sh |
Instead both of these above command you can also use start-all.sh, but its now deprecated so its not recommended to be used for better Hadoop operations.
Check that hadoop process are running, type jps and check if you see all the hadoop services running
1 2 |
cd jps |
You can also check Resource Manager by navigating to this link http://localhost:8088/
To stop all hadoop process, run this command:
1 |
stop-all.sh |
Tafang Joshua
thanks very much for the tutorial but i get this error
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
when i type the command hdfs namenode -format