Monday 20 May 2013

Hadoop Fundamentals......

I started to learn Hadoop by following Michael's excellent articles at

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

This guide will describe efficient ways to solve the setup problems related to Hadoop and also, hacks to solve the most commonly encountered problems.

a) The guide says to use two VMs running on Ubuntu Linux. So, I decided to use Virtual Box ( Oracle ) running two VMs using Ubuntu Linux ( 10.0.04 LTS ). I decided to use Hadoop 1.2.0.

I needed the following Four broad network rules
        - The master should able to SSH/PING to slave
        - Both the master and the slave should be able to ping each other
        - Both the master and the slave should have different I.P address
        - All the IPs should be "static". Hence, using DHCP, is not an option because of the large maintenance work needed to manage /etc/hosts file.

I spent considerable amount of time with the various interfaces and the right interface to use is "Host-only Adapter". This will give you unique IP address for each of the VMs ( 192.168.56.101 for the First VM, 192.168.56.102 for the second VM, and so on ).

If you had used "NAT", Virtual box would have assigned "the same default unique IP( 10.0.2.15 ) address" for each and every VMs. Hence, this would violated the third condition above.

If you had used "Bridged Adapter", it would have used DHCP to allocate the IP address dynamically, violating the fourth condition. You could have disabled DHCP, but, then you would have to request your IT administrator to assign you static IPs. This is not possible in many of the organization.

b) When you bring up your Hadoop cluster, you might get the following error

java.io.IOException: File /user/ubuntu/pies could only be replicated to 0 nodes, instead of 1

This happens when you have not configured your "$HADOOP_HOME/conf/slaves" file properly. You should add your "hostname" to conf/slaves file. You will get the error if your hostname points to "localhost".

hduser@ubu0:/usr/local/hadoop/conf$ cat slaves
ubu0
ubu1

c) Many a times the the "dataNode" at the "Slave" is unable to reach to "Namenode". This is because the dataNode on the slave still refers to IP address of "127.0.0.1" and hence, is unable to reach the "Master". Ensure that you disable the local loopback address in /etc/hosts file.
hduser@ubu0:/usr/local/hadoop/conf$ cat /etc/hosts
#127.0.0.1 localhost
#127.0.1.1 ubu0
192.168.56.101    master
192.168.56.102    slave



No comments:

Post a Comment