centos hadoop 伪分布式 hadoop单机模式

如何在win7下的eclipse中调试Hadoop2.2.0的程序

在上一篇博文中,散仙已经讲了Hadoop的单机伪分布的部署,本篇,散仙就说下,如何eclipse中调试hadoop2.2.0,如果你使用的还是hadoop1.x的版本,那么,也没事,散仙在以前的博客里,也写过eclipse调试1.x的hadoop程序,两者最大的不同之处在于使用的eclipse插件不同,hadoop2.x与hadoop1.x的API,不太一致,所以插件也不一样,我们只需要使用分别对应的插件即可.

下面开始进入正题:

序号名称描述

1 eclipse Juno Service Release 4.2的本

2操作系统 Windows7

3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar

4 hadoop的集群环境虚拟机Linux的Centos6.5单机伪分布式

5调试程序 Hellow World

遇到的几个问题如下:

Java代码

java.io.IOException:Couldnotlocateexecutablenull\bin\winutils.exeintheHadoopbinaries.

解决办法:

在org.apache.hadoop.util.Shell类的checkHadoopHome()方法的返回值里写固定的

本机hadoop的路径,散仙在这里更改如下:

Java代码

privatestaticStringcheckHadoopHome(){

//firstchecktheDflaghadoop.home.dirwithJVMscope

//System.setProperty("hadoop.home.dir","...");

Stringhome=System.getProperty("hadoop.home.dir");

//fallbacktothesystem/user-globalenvvariable

if(home==null){

home=System.getenv("HADOOP_HOME");

}

try{

//couldn'tfindeithersettingforhadoop'shomedirectory

if(home==null){

thrownewIOException("HADOOP_HOMEorhadoop.home.dirarenotset.");

}

if(home.startsWith("\"")&&home.endsWith("\"")){

home=home.substring(1,home.length()-1);

}

//checkthatthehomesettingisactuallyadirectorythatexists

Filehomedir=newFile(home);

if(!homedir.isAbsolute()||!homedir.exists()||!homedir.isDirectory()){

thrownewIOException("Hadoophomedirectory"+homedir

+"doesnotexist,isnotadirectory,orisnotanabsolutepath.");

}

home=homedir.getCanonicalPath();

}catch(IOExceptionioe){

if(LOG.isDebugEnabled()){

LOG.debug("Failedtodetectavalidhadoophomedirectory",ioe);

}

home=null;

}

//固定本机的hadoop地址

home="D:\\hadoop-2.2.0";

returnhome;

}

第二个异常,Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries.找不到win上的执行程序,可以去下载bin包,覆盖本机的hadoop跟目录下的bin包即可

第三个异常:

Java代码

Exceptioninthread"main"java.lang.IllegalArgumentException:WrongFS:hdfs://192.168.130.54:19000/user/hmail/output/part-00000,expected:file:///

atorg.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)

atorg.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)

atorg.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)

atorg.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)

atorg.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)

atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)

atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)

atcom.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)

出现这个异常,一般是HDFS的路径写的有问题,解决办法,拷贝集群上的core-site.xml和hdfs-site.xml文件,放在eclipse的src根目录下即可。

第四个异常:

Java代码

Exceptioninthread"main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

出现这个异常,一般是由于HADOOP_HOME的环境变量配置的有问题,在这里散仙特别说明一下,如果想在Win上的eclipse中成功调试Hadoop2.2,就需要在本机的环境变量上,添加如下的环境变量:

(1)在系统变量中,新建HADOOP_HOME变量,属性值为D:\hadoop-2.2.0.也就是本机对应的hadoop目录

(2)在系统变量的Path里,追加%HADOOP_HOME%/bin即可

以上的问题,是散仙在测试遇到的,经过对症下药,我们的eclipse终于可以成功的调试MR程序了,散仙这里的Hellow World源码如下:

Java代码

packagecom.qin.wordcount;

importjava.io.IOException;

importorg.apache.hadoop.fs.FileSystem;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.LongWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapred.JobConf;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/***

*

*Hadoop2.2.0测试

*放WordCount的例子

*

*@authorqindongliang

*

*hadoop技术交流群:376932160

*

*

**/

publicclassMyWordCount{

/**

*Mapper

*

***/

privatestaticclassWMapperextendsMapper<LongWritable,Text,Text,IntWritable>{

privateIntWritablecount=newIntWritable(1);

privateTexttext=newText();

@Override

protectedvoidmap(LongWritablekey,Textvalue,Contextcontext)

throwsIOException,InterruptedException{

Stringvalues[]=value.toString().split("#");

//System.out.println(values[0]+"========"+values[1]);

count.set(Integer.parseInt(values[1]));

text.set(values[0]);

context.write(text,count);

}

}

/**

*Reducer

*

***/

privatestaticclassWReducerextendsReducer<Text,IntWritable,Text,Text>{

privateTextt=newText();

@Override

protectedvoidreduce(Textkey,Iterable<IntWritable>value,Contextcontext)

throwsIOException,InterruptedException{

intcount=0;

for(IntWritablei:value){

count+=i.get();

}

t.set(count+"");

context.write(key,t);

}

}

/**

*改动一

*(1)shell源码里添加checkHadoopHome的路径

*(2)974行,FileUtils里面

***/

publicstaticvoidmain(String[]args)throwsException{

//Stringpath1=System.getenv("HADOOP_HOME");

//System.out.println(path1);

//System.exit(0);

JobConfconf=newJobConf(MyWordCount.class);

//Configurationconf=newConfiguration();

//conf.set("mapred.job.tracker","192.168.75.130:9001");

//读取person中的数据字段

//conf.setJar("tt.jar");

//注意这行代码放在最前面,进行初始化,否则会报

/**Job任务**/

Jobjob=newJob(conf,"testwordcount");

job.setJarByClass(MyWordCount.class);

System.out.println("模式:"+conf.get("mapred.job.tracker"));;

//job.setCombinerClass(PCombine.class);

//job.setNumReduceTasks(3);//设置为3

job.setMapperClass(WMapper.class);

job.setReducerClass(WReducer.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

Stringpath="hdfs://192.168.46.28:9000/qin/output";

FileSystemfs=FileSystem.get(conf);

Pathp=newPath(path);

if(fs.exists(p)){

fs.delete(p,true);

System.out.println("输出路径存在,已删除!");

}

FileInputFormat.setInputPaths(job,"hdfs://192.168.46.28:9000/qin/input");

FileOutputFormat.setOutputPath(job,p);

System.exit(job.waitForCompletion(true)?0:1);

}

}

控制台,打印日志如下:

Java代码

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.job.trackerisdeprecated.Instead,usemapreduce.jobtracker.address

模式:local

输出路径存在,已删除!

INFO-Configuration.warnOnceIfDeprecated(840)|session.idisdeprecated.Instead,usedfs.metrics.session-id

INFO-JvmMetrics.init(76)|InitializingJVMMetricswithprocessName=JobTracker,sessionId=

WARN-JobSubmitter.copyAndConfigureFiles(149)|Hadoopcommand-lineoptionparsingnotperformed.ImplementtheToolinterfaceandexecuteyourapplicationwithToolRunnertoremedythis.

WARN-JobSubmitter.copyAndConfigureFiles(258)|Nojobjarfileset.Userclassesmaynotbefound.SeeJoborJob#setJar(String).

INFO-FileInputFormat.listStatus(287)|Totalinputpathstoprocess:1

INFO-JobSubmitter.submitJobInternal(394)|numberofsplits:1

INFO-Configuration.warnOnceIfDeprecated(840)|user.nameisdeprecated.Instead,usemapreduce.job.user.name

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.output.value.classisdeprecated.Instead,usemapreduce.job.output.value.class

INFO-Configuration.warnOnceIfDeprecated(840)|mapred.mapoutput.value.classisdeprecated.Instead,usemapreduce.map.output.value.class

INFO-Configuration.warnOnceIfDeprecated(840)|mapreduce.map.classisdeprecated.Instead,usemapreduce.job.map.class

INFO-C

centos 6.5怎么搭建hadoop2.7.3

总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs、mapreduce等主从关系。

1、环境,3台CentOS6.5,64位,Hadoop2.7.3需要64位Linux,操作系统十几分钟就可以安装完成,

Master 192.168.0.182

Slave1 192.168.0.183

Slave2 192.168.0.184

2、SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys

(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置,

#RSAAuthentication yes

#PubkeyAuthentication yes

(2)输入命令,ssh-keygen-t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,

(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,

cat id_rsa.pub>> authorized_keys

ssh root@192.168.0.183 cat~/.ssh/id_rsa.pub>> authorized_keys

ssh root@192.168.0.184 cat~/.ssh/id_rsa.pub>> authorized_keys

(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录

(5)完成,ssh root@192.168.0.183、ssh root@192.168.0.184就不需要输入密码了

3、安装JDK,Hadoop2.7需要JDK7,由于我的CentOS是最小化安装,所以没有OpenJDK,直接解压下载的JDK并配置变量即可

(1)下载“jdk-7u79-linux-x64.gz”,放到/home/java目录下

(2)解压,输入命令,tar-zxvf jdk-7u79-linux-x64.gz

(3)编辑/etc/profile

export JAVA_HOME=/home/java/jdk1.7.0_79

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$PATH:$JAVA_HOME/bin

(4)使配置生效,输入命令,source/etc/profile

(5)输入命令,java-version,完成

4、安装Hadoop2.7,只在Master服务器解压,再复制到Slave服务器

(1)下载“hadoop-2.7.0.tar.gz”,放到/home/hadoop目录下

(2)解压,输入命令,tar-xzvf hadoop-2.7.0.tar.gz

(3)在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name

5、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.0.182:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/hadoop/tmp</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131702</value>

</property>

</configuration>

6、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>192.168.0.182:9001</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

</configuration>

7、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>192.168.0.182:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>192.168.0.182:19888</value>

</property>

</configuration>

8、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的mapred-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>192.168.0.182:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>192.168.0.182:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>192.168.0.182:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>192.168.0.182:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>192.168.0.182:8088</value>

</property>

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>768</value>

</property>

</configuration>

9、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了,

export JAVA_HOME=/home/java/jdk1.7.0_79

10、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点,

192.168.0.183

192.168.0.184

11、将配置好的Hadoop复制到各个节点对应位置上,通过scp传送,

scp-r/home/hadoop 192.168.0.183:/home/

scp-r/home/hadoop 192.168.0.184:/home/

12、在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.7.0目录

(1)初始化,输入命令,bin/hdfs namenode-format

注意:执行这步的时候可能会报一个错误:

java.net.UnknownHostException: tiancunPC: tiancunPC: unknown error

at java.net.InetAddress.getLocalHost(InetAddress.java:1505)

at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264)

at org.apache.hadoop.net.DNS.<clinit>(DNS.java:57)

at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:982)

at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:591)

at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157)

at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)

at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)

Caused by: java.net.UnknownHostException: tiancunPC: unknown error

at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)

at java.net.InetAddress.getLocalHost(InetAddress.java:1500)

... 8 more

16/11/11 19:15:23 WARN net.DNS: Unable to determine address of the host-falling back to"localhost" address

java.net.UnknownHostException: tiancunPC: tiancunPC: unknown error

at java.net.InetAddress.getLocalHost(InetAddress.java:1505)

at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:287)

at org.apache.hadoop.net.DNS.<clinit>(DNS.java:58)

at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:982)

at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:591)

at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157)

at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)

at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)

Caused by: java.net.UnknownHostException: tiancunPC: unknown error

at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)

at java.net.InetAddress.getLocalHost(InetAddress.java:1500)

... 8 more

linux中使用hostname查看为:

[root@tiancunPC hadoop-2.7.3]# hostname

tiancunPC

查看/etc/hosts为:

[root@tiancunPC hadoop-2.7.3]# cat/etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

难怪会映射不到,修改/etc/hosts

[root@tiancunPC hadoop-2.7.3]# cat/etc/hosts

127.0.0.1 tiancunPC localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

对应修改另外两个机器的主机名,在执行那个命令就可以了

(2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh

执行sbin/start-all.sh可能会有错误提示:

maps to localhost(IP), but this does not map back to the address

解决办法:

修改/etc/ssh/ssh_config

vim/etc/ssh/ssh_config

GSSAPIAuthentication no

这个时候可能还会出现这个错误提示:

hadoop出现namenode running as process 18472. Stop it first.,hadoopnamenode

解决办法:重新启动一下hadoop

(3)停止的话,输入命令,sbin/stop-all.sh

(4)输入命令,jps,可以看到相关信息

阅读剩余
THE END