Olinuxino goes to hadoop

One application I would like to test is to know how Olinuxino goes if you using it in an hadoop environment.

At first sight it might not be a very good solution. Indeed it doesn’t have a very big storage habilitity and you can not think of using a smartmedia card as a normal filesystem.

The solution I propose to test is to use a laptop (or another machine) as a master (and hdfs filesystem). I would like to know if filesystem is a bottleneck or if it is the network.

Step 1:Install an hadoop server

In my case it is my laptop and I use a 30Go partion on my sd disk. The IP I will use for my server is 192.168.1.7. 

This step will install a « normal » hadoop distribution on a PC and will be used as

To simplify the best thing is to add alias in /etc/hosts

Step 2: setting up the board

Download standard image fot olinuxino. It can be found here: https://www.olimex.com/wiki/images/2/29/Debian_FS_34_90_camera_A20-olimex.torrent taken from the official olinuxino github: (https://github.com/OLIMEX/OLINUXINO/tree/master/SOFTWARE/A20/A20-build).

First problem is that by default network is not enabled. Change the file /etc/network/interfaces and add

auto eth0
allow-hotplug eth0
iface eth0 inet dhcp

Then type:

sudo dhclient eth0
/etc/init.d/networking restart

Get your inetaddress on the board by typing

ifconfig:
    eth0      Link encap:Ethernet  HWaddr 02:cf:07:01:5a:b7

    inet addr:192.168.1.254  Bcast:192.168.1.255  Mask:255.255.255.248
    sudo apt-get update
    sudo apt-get upgrade
    sudo apt-get install ssh

edit /etc/ssh/sshd_config

and ensure that the line « PermitRootLogin yes » and « StrictModes no » line has theses values

/etc/init.d/ssh restart

To test if everything is correctly setup go to your server computer and type (password by defaut for root is olimex)

ssh 192.168.1.254 -l root

Try also to connect to your server (in my case adress is 192.168.1.75 and I create an account names local)

ssh 192.168.1.75 -l local

Step 1: Adding a User

sudo addgroup hadoop_group
sudo adduser --ingroup hadoop_group hduser1
sudo adduser hduser1 sudo
su – hduser1
vi ~/.bashrc
# add the folowing line 
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre/
export HADOOP_HOME=/home/hduser1/hadoop
export MAHOUT_HOME=/home/hduser1/hadoop/mahout

ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 2: installing hadoop

 

sudo aptitude install openjdk-7-jre

wget http://apache.mirrors.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz

tar zxvf hadoop-2.7.0.tar.gz

mv hadoop-2.7.0/hadoop hadoop

Now we have to modify the configuration file to acces to the masternode and the hdfs

Edit hadoop-env.sh

set line export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre

edit the file core-site.xml
and add

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp</value>
    <description>A base for other temporary directories.</description>
    </property>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://192.168.1.75:54310</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

Now you can tst if evrything works:
hadoop fs -ls

Step 3: installing mahout

One easy way to test hadoop is to install mahout. This project includes several hadoop jobs to classify data. So we will using it for test purpose.

Download it from apache:

cd hadopp
wget http://apache.mirrors.ovh.net/ftp.apache.org/dist/mahout/0.10.0/mahout-distribution-0.10.0.tar.gz
tar zxvf mahout-distribution-0.10.0.tar.gz
mv mahout-distribution-0.10.0 mahout

Now everything should be corectly set.

Step 4: Benchmarking

Go into hadoop/mahout

do: sudo apt-get install curl examples/bin/classify-wikipedia.sh

Now to bench do

time examples/bin/classify-wikipedia.sh

Leave a Reply