Sanjay Sharma’s Weblog

May 29, 2009

Setting Hadoop 0.20.0 cluster with a windows slave

Filed under: Advanced computing, Java world, Tech — Tags: , , , , — indoos @ 8:46 am

Here are the steps for with setting up a Hadoop cluster and pluging-in a windows machine as a slave-

a. First setup a psuedo-hadoop on a linux machine as explained in

I was able to use this excellent tutorial with minor changes to get psuedo-hadoop cluster running on a Centos/Ubuntu and a Windows machine.

I used a common user hadoop created at all machines

b. Next step was to get all the psuedo-machines work together as a real cluster.  Again was a easy reckoner to get it working

Some easy tips to get hadoop working in cluster mode are

  1. Use machine names everywhere instead of IP address and change /etc/hosts at all machines
  2. Configure the setup at the master machine  i.e. the conf xml including masters and slave files as well as /etc/hosts and copy all these conf files and entries in /etc/hosts file to the slave nodes
  3. The same copying thingy helps for authorized_keys file where we enter all public keys from each slave to master  machine’s  authorized_keys and then copying this authorized_keys file to all slaves.
  4. set JAVA_HOME is each installations file. I had some issues with setting it in .profile and still getting some JAVA_HOME problems
  5. An other easy option is to create a gzip of your master hadoop install and copy it for setup in slave nodes

c. So now for the windows bit-

  1. install cygwin if already not done that
  2. check if you have sshd server installed in cygwin setup- if not, install it
  3. Double check if you a service CYGWIN sshd running under windows services
  4. create a hadoop user by-
cygwin> net user hadoop password /add /yes
cygwin> mkpasswd -l -u hadoop >> /etc/passwd
cygwin> chown hadoop -R /home/hadoop

d. Treat windows machine as *nix-

  1. Now use Putty to login to your local windows machine using the newly created hadoop user
  2. Setup hadoop as you would do for any Linux machine- easy option is to copy paster master hadoop installation
  3. Do not forget to setup .ssh files and copying the pub key in master authorized_key file and copying back that authorized_key to this windows machine. Also do add JAVA_HOME in file which should be a /cygdrive/<path to java6>  entry

e. Assuming that the master server is already running, run this slave using “bin/ start datanode” or “bin/ start tasktracker” to run datanode or task tracker instances.

Next, will write about how I managed to get Hive-0.30 release working with Hadoop 0.20.0 on my small Hadoop cluster with 3 Linux machines and 1 windows machine


Create a free website or blog at