twitter

How To's Tutorials

Hadoop Installation using a single Node cluster

Overview :

 

Hadoop is a technology which enables distributed processing of large set of data sets across clusters ranging from 1 server to thousands of server ensuring a high degree of Fault Tolerence.

 

Hadoop is a framework which consists of following basic modules

 

1) Hadoop Common - contains libraries and utilities needed by other Hadoop modules
2) Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster.
3) Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
4) Hadoop MapReduce - a programming model for large scale data processing


In this Tutorial we will install & configure Hadoop on Ubuntu(12.10/13.04/13.10).Follow the below steps :

 

Step:1 Update your machine

root@hadoop1:~# apt-get update

 

Install python-software-properties module

root@hadoop1:~# apt-get install python-software-properties

 

Add the sun – java repository

root@hadoop1:~# add-apt-repository ppa:webupd8team/java
root@hadoop1:~# apt-get update && sudo apt-get upgrade
root@hadoop1:~# apt-get install oracle-java6-installer

 

Check the installed java version

root@hadoop1:~# java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

 

If there are 2 versions of java as seen from the update-java-alternatives command

 

Then run the following command to set this java to the latest versions

root@hadoop1:~# update-alternatives --config java

 

There is only one alternative in link group java: /usr/lib/jvm/java-6-oracle/jre/bin/java
Nothing to configure.

 

Step:2  Adding a group to the system

root@hadoop1:~#  addgroup hadoopgroup

 

Adding a hadoop user to the earlier created group

root@hadoop1:~# adduser --ingroup hadoopgroup hadoopuser
Adding user `hadoopuser' ...
Adding new user `hadoopuser' (1001) with group `hadoopgroup' ...
Creating home directory `/home/hadoopuser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoopuser
Enter the new value, or press ENTER for the default
Full Name []: HADOOP USER
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y

 

Step:3 Create Passwordless authentication

root@hadoop1:~# su - hadoopuser
Create Keys for password less access
hadoopuser@hadoop1:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoopuser/.ssh/id_rsa):
Created directory '/home/hadoopuser/.ssh'.
Your identification has been saved in /home/hadoopuser/.ssh/id_rsa.
Your public key has been saved in /home/hadoopuser/.ssh/id_rsa.pub.
The key fingerprint is:
82:a0:cb:f4:fa:1f:ac:f5:29:54:34:e7:56:ee:b0:9f hadoopuser@hadoop1.example.com
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|       o . .     |
|  .   . + o      |
| . . . . + .     |
|..  . o S +      |
|o.. .. . . .     |
|.. ..+    . .    |
|  . o.o .  E     |
| ..o...o         |
+-----------------+
hadoopuser@hadoop1:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hadoopuser@hadoop1:~$ ssh localhost
we will get below message after login
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is ee:be:18:ef:e6:3d:e3:8d:8a:17:ca:d1:a3:d6:d6:49.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-23-generic x86_64)

 

Step:4 Disable ipv6

 

As a root Append the file /etc/sysctl.conf and add the following lines

 

root@hadoop1:~# vi /etc/sysctl.conf

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Save & exit
root@hadoop1:~# sysctl -p
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

 

To verify the ipv6 has been disabled

root@hadoop1:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1

 

Step:5 Adding the hadoop repository

root@hadoop1:~# add-apt-repository ppa:hadoop-ubuntu/stable
You are about to add the following PPA to your system:
Hadoop Stable packages
These packages are based on Apache Bigtop with appropriate patches to enable native integration on Ubuntu Oneiric onwards and for ARM based archictectures.
Please report bugs here - https://bugs.launchpad.net/hadoop-ubuntu-packages/+filebug
More info: https://launchpad.net/~hadoop-ubuntu/+archive/stable
Press [ENTER] to continue or ctrl-c to cancel adding it
gpg: keyring `/tmp/tmprAbnTB/secring.gpg' created
gpg: keyring `/tmp/tmprAbnTB/pubring.gpg' created
gpg: requesting key 84FBAFF0 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmprAbnTB/trustdb.gpg: trustdb created
gpg: key 84FBAFF0: public key "Launchpad PPA for Hadoop Ubuntu Packagers" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK

 

Update and upgrade

root@hadoop1:~# apt-get update && apt-get upgrade 

 

Step:6 Now Install Hadoop

root@hadoop1:~# apt-get install hadoop

 

Verify the hadoopuser details

root@hadoop1:~# id hadoopuser
uid=1001(hadoopuser) gid=1002(hadoopgroup) groups=1002(hadoopgroup)

 

Add the hadoop user to the sudo file so that it will have the root level permissions

root@hadoop1:~# visudo
and add the following line :
hadoopuser ALL=(ALL:ALL) ALL

 

Set the environment in the hadoop user .bashrc file as follows

root@hadoop1:~# vi /home/hadoopuser/.bashrc
and add the following line
# Set Hadoop-related environment variables  
export HADOOP_HOME=/home/hadoopuser/hadoop 
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)`
export JAVA_HOME=/usr/lib/jvm/java-6-oracle/
# Add Hadoop bin/ directory to PATH 
export PATH=$PATH:$HADOOP_HOME/bin 
export PATH=$PATH:/usr/lib/hadoop/bin/

 

#Set some aliased
unalias fs &> /dev/null  
alias fs="hadoop fs"   
unalias hls &> /dev/null 
alias hls="fs -ls"

 

Step:7 Now Configure the Hadoop

root@hadoop1:~# chown -R hadoopuser:hadoopgroup /var/log/hadoop/
root@hadoop1:~# chmod -R 755 /var/log/hadoop/
root@hadoop1:~# cd /usr/lib/hadoop/conf/
root@hadoop1:/usr/lib/hadoop/conf# ls -ltr
total 76
-rw-r--r-- 1 root hadoop  382 Mar 24  2012 taskcontroller.cfg
-rw-r--r-- 1 root hadoop 1195 Mar 24  2012 ssl-server.xml.example
-rw-r--r-- 1 root hadoop 1243 Mar 24  2012 ssl-client.xml.example
-rw-r--r-- 1 root hadoop   10 Mar 24  2012 slaves
-rw-r--r-- 1 root hadoop   10 Mar 24  2012 masters
-rw-r--r-- 1 root hadoop  178 Mar 24  2012 mapred-site.xml
-rw-r--r-- 1 root hadoop 2033 Mar 24  2012 mapred-queue-acls.xml
-rw-r--r-- 1 root hadoop 4441 Mar 24  2012 log4j.properties
-rw-r--r-- 1 root hadoop  178 Mar 24  2012 hdfs-site.xml
-rw-r--r-- 1 root hadoop 4644 Mar 24  2012 hadoop-policy.xml
-rw-r--r-- 1 root hadoop 1488 Mar 24  2012 hadoop-metrics2.properties
-rw-r--r-- 1 root hadoop 2237 Mar 24  2012 hadoop-env.sh
-rw-r--r-- 1 root hadoop  327 Mar 24  2012 fair-scheduler.xml
-rw-r--r-- 1 root hadoop  178 Mar 24  2012 core-site.xml
-rw-r--r-- 1 root hadoop  535 Mar 24  2012 configuration.xsl
-rw-r--r-- 1 root hadoop 7457 Mar 24  2012 capacity-scheduler.xml
root@hadoop1:/usr/lib/hadoop/conf#

 

But before we start using them, we need to modify several files in the /conf folder.

hadoop-env.sh
replace the JAVA_HOME line with the below line.
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

core-site.xml
Create a base for temporary files
root@hadoop1:/usr/lib/hadoop/conf# mkdir /home/hadoopuser/tmp
root@hadoop1:/usr/lib/hadoop/conf# chown hadoopuser:hadoopgroup /home/hadoopuser/tmp/
root@hadoop1:/usr/lib/hadoop/conf# chmod 755 /home/hadoopuser/tmp/
root@hadoop1:/usr/lib/hadoop/conf#

 

Replace the original contents with

root@hadoop1:/usr/lib/hadoop/conf# cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoopuser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.  A URI whose  scheme and authority determine the FileSystem implementation.  The  uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

root@hadoop1:/usr/lib/hadoop/conf#
mapred-site.xml

root@hadoop1:/usr/lib/hadoop/conf# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

root@hadoop1:/usr/lib/hadoop/conf#
hdfs-site.xml

root@hadoop1:/usr/lib/hadoop/conf# cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
root@hadoop1:/usr/lib# chown -R hadoopuser:hadoopgroup /usr/lib/hadoop/
root@hadoop1:/usr/lib# ls -ltr hadoop/
total 16
lrwxrwxrwx 1 hadoopuser hadoopgroup   15 Apr 24  2012 pids -> /var/run/hadoop
lrwxrwxrwx 1 hadoopuser hadoopgroup   15 Apr 24  2012 logs -> /var/log/hadoop
lrwxrwxrwx 1 hadoopuser hadoopgroup   41 Apr 24  2012 hadoop-tools-1.0.2.jar -> ../../share/hadoop/hadoop-tools-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   40 Apr 24  2012 hadoop-test-1.0.2.jar -> ../../share/hadoop/hadoop-test-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   44 Apr 24  2012 hadoop-examples-1.0.2.jar -> ../../share/hadoop/hadoop-examples-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   40 Apr 24  2012 hadoop-core.jar -> ../../share/hadoop/hadoop-core-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   40 Apr 24  2012 hadoop-core-1.0.2.jar -> ../../share/hadoop/hadoop-core-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   39 Apr 24  2012 hadoop-ant-1.0.2.jar -> ../../share/hadoop/hadoop-ant-1.0.2.jar
lrwxrwxrwx 1 hadoopuser hadoopgroup   26 Apr 24  2012 contrib -> ../../share/hadoop/contrib
lrwxrwxrwx 1 hadoopuser hadoopgroup   16 Apr 24  2012 conf -> /etc/hadoop/conf
drwxr-xr-x 9 hadoopuser hadoopgroup 4096 Dec 15 05:16 webapps
drwxr-xr-x 2 hadoopuser hadoopgroup 4096 Dec 15 05:16 libexec
drwxr-xr-x 2 hadoopuser hadoopgroup 4096 Dec 15 05:16 bin
drwxr-xr-x 3 hadoopuser hadoopgroup 4096 Dec 15 05:16 lib
root@hadoop1:/usr/lib#
root@hadoop1:/etc/hadoop/conf# chown -R hadoopuser:hadoopgroup /etc/hadoop/
root@hadoop1:/etc/hadoop/conf#
root@hadoop1:~# su - hadoopuser
hadoopuser@hadoop1:~$ mkdir hadoop

 

Commands To Manage Hadoop Services:

 

start-dfs.sh - Starts the Hadoop DFS daemons, the namenode and datanodes. Use this before start-mapred.sh
stop-dfs.sh - Stops the Hadoop DFS daemons.
start-mapred.sh - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
stop-mapred.sh - Stops the Hadoop Map/Reduce daemons.
start-all.sh - Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers. Deprecated; use start-dfs.sh then start-mapred.sh
stop-all.sh - Stops all Hadoop daemons. Deprecated; use stop-mapred.sh then stop-dfs.sh

 

Format the hadoop file system

 

root@hadoop1:/usr/lib/hadoop/conf# su - hadoopuser
hadoopuser@hadoop1:~$ hadoop namenode -format
13/12/15 19:53:16 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = java.net.UnknownHostException: hadoop1.example.com: hadoop1.example.com
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
************************************************************/
13/12/15 19:53:17 INFO util.GSet: VM type       = 64-bit
13/12/15 19:53:17 INFO util.GSet: 2% max memory = 19.33375 MB
13/12/15 19:53:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/12/15 19:53:17 INFO util.GSet: recommended=2097152, actual=2097152
13/12/15 19:53:37 INFO namenode.FSNamesystem: fsOwner=hadoopuser
13/12/15 19:53:37 INFO namenode.FSNamesystem: supergroup=supergroup
13/12/15 19:53:37 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/12/15 19:53:37 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/12/15 19:53:37 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/12/15 19:53:37 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/12/15 19:53:58 INFO common.Storage: Image file of size 116 saved in 0 seconds.
13/12/15 19:53:58 INFO common.Storage: Storage directory /home/hadoopuser/tmp/dfs/name has been successfully formatted.
13/12/15 19:53:58 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: hadoop1.example.com: hadoop1.example.com
************************************************************/

 

Starting the Hadoop Cluster using start-all.sh 

hadoopuser@hadoop1:~$ start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-namenode-hadoop1.example.com.out
localhost: starting datanode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-datanode-hadoop1.example.com.out
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-secondarynamenode-hadoop1.example.com.out
starting jobtracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-jobtracker-hadoop1.example.com.out
localhost: starting tasktracker, logging to /usr/lib/hadoop/libexec/../logs/hadoop-hadoopuser-tasktracker-hadoop1.example.com.out
hadoopuser@hadoop1:~$
or run
start-dfs.sh 
start-mapred.sh

 

To check if hadoop is running or not, use the below command

hadoopuser@hadoop1:~$ jps
35401 NameNode
35710 JobTracker
35627 SecondaryNameNode
35928 Jps
hadoopuser@hadoop1:~$

harry on, 2013-12-25 11:54:04
GOOD article.
Akash on, 2013-12-26 07:41:57
I like this tutorial and the way you explained it. Please guide us for hadoop on window also
Post Your Comments
Name:
Email:
Comment:
Security Code  *
Enter Security Code  *
Can't read the image? click here to refresh