Prerequisite

  • Java JDK 1.8.0

Java development kit can be confirmed that has been installed in your machine through the command javac -version.

Hadoop is only supported with 8 and 11 Java version.

Setup Hadoop

Download Hadoop

Download Hadoop 3.3.0 (Binary download) and extract with Winrar (Windows) or Keka (Mac). After the hadoop-3.3.0.tar.gz has been downloaded, it has to be extracted to C:\ folder.

Download Hadoop from Apache

Setup Environmental Variables

Open the System Properties window from Control Panel and select the Environment Variables button.

  • User Variables
Variable Value
HADOOP_HOME C:\hadoop-3.3.0\bin
  • System Variables
Variable Value
PATH C:\hadoop-3.3.0\bin

Configuration Modification

Edit each file and paste below xml paragraph and save each file.

  • hadoop-3.3.0/etc/hadoop/core-site.xml
1
2
3
4
5
6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
  • hadoop-3.3.0/etc/hadoop/mapred-site.xml
1
2
3
4
5
6
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  • hadoop-3.3.0/etc/hadoop/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/c:/software/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/c:/software/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>

Remember to replace /c:/software/hadoop-3.3.0 with your hadoop root directory 注意把/c:/software/hadoop-3.3.0替换成你的 hadoop 根目录

  • hadoop-3.3.0/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
  • hadoop-3.3.0/etc/hadoop/hadoop-env.cmd
1
set JAVA_HOME=C:\Java

Remember to replace C:\Java with your java root directory 注意把C:\Java替换成你的 java 根目录

Update bin folder

Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file just download.

Download bin from Github
Download bin from Drive

Run 执行 Hadoop

  1. Enter hdfs namenode -format in the bin directory, and you should see the result 在 bin 目录下输入hdfs namenode -format,应该能看到这样的结果:

a

  1. Enter start-all.cmd in the sbin directory, and multiple cmd windows will be created. At this time, enter jps and you should see the following results 在 sbin 目录下输入start-all.cmd,会有多个 cmd 窗口被创建,此时输入jps,应当看到如下结果:

a

Sometimes encounter the failure of DataNode creation. Delete the data/datanode folder in the root directory to solve the problem in start-all. 有时会遇到 DataNode 创建失败的情况,删除根目录下 data/datanode 文件夹在 start-all 解决问题。

  1. Make sure 4 cmd are all running.

a

  1. Enter localhost:50070 in the browser and you should see the following webpage 在浏览器中输入localhost:50070应当能看到如下网页:

a

  1. Enter localhost:8088 in the browser and you should see the following webpage 在浏览器中输入localhost:50070应当能看到如下网页:

  2. Enter stop-all.cmd in the sbin directory.

WordCount & MapReduce

If you want to play with Hadoop’s MapReduce algorithm and WordCount.java.

Download from Apache