Nov 28

How to install Hadoop on windows7

    During my installing of hadoop on my laptop which was based on windows7, I encountered some problems which cost me couple of hours, so I plan to write it down.

    The standard process was  :

  1. install Cygwin(must enable openSSH feature when installing).
  2. download Hadoop package from official site, and extract it.(PS: in my laptop, I must extract it in the directory which cygwin located, or hadoop namenode would not be started. )
  3. set environment variables: JAVA_HOME, HADOOP_INSTALL, add %HADOOP_INSTALL%/bin to PATH.
  4. open %HADOOP_INSTALL%/conf/hadoop-env.sh, add "export JAVA_HOME=***", add "export HADOOP_INSTALL=**".
  5. then open Cygwin terminal, type "hadoop version", if no error found ,then it's ok. if error found , check your HADOOP_INSTALL, and PATH setting.(PS: be ware of the JAVA_HOME, it is recommended that this path should not include " "(space)).
  6. modity core-site.xml, hdfs-site.xml, mapred-site.xml using below content:
    <?xml version="1.0"?>
    <!– core-site.xml –>
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost/</value>
    </property>
    </configuration>
    <?xml version="1.0"?>
    <!– hdfs-site.xml –>
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    </configuration>
    <?xml version="1.0"?>
    <!– mapred-site.xml –>
    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
    </property>
    </configuration>
  7. configuring SSH using command: ssh-host-config , select not create private privilege account(may not the same word, but seems like it.).
    then generate a new SSH key (so we can login without password, the same as hadoop running on it) .
    % ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    % cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    Test with : ssh localhost ( before we can login , we must start the cygwin ssh service firstly. )
  8.  initialize hadoop namenode using :  hadoop namenode -format. if you can find "tmp/hadoop-*/dfs/name" in the directory which hadoop located, then congratulations.
  9. run start-all.sh (it will start namenode, jobtracker, secondary namenode, datanode). using jps command to verify if these deamon process were started successfully. also can check it by http://localhost:50030/ for the jobtracker, http://localhost:50070/ for the namenode.

    In my laptop, I must change the cygwin service configuration, I must change the service logon role. (using the account which was in administrator group)