centos hadoop 伪分布式 hadoop单机模式
如何在win7下的eclipse中调试Hadoop2.2.0的程序
在上一篇博文中,散仙已经讲了Hadoop的单机伪分布的部署,本篇,散仙就说下,如何eclipse中调试hadoop2.2.0,如果你使用的还是hadoop1.x的版本,那么,也没事,散仙在以前的博客里,也写过eclipse调试1.x的hadoop程序,两者最大的不同之处在于使用的eclipse插件不同,hadoop2.x与hadoop1.x的API,不太一致,所以插件也不一样,我们只需要使用分别对应的插件即可.
下面开始进入正题:
序号名称描述
1 eclipse Juno Service Release 4.2的本
2操作系统 Windows7
3 hadoop的eclipse插件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoop的集群环境虚拟机Linux的Centos6.5单机伪分布式
5调试程序 Hellow World
遇到的几个问题如下:
Java代码
java.io.IOException:Couldnotlocateexecutablenull\bin\winutils.exeintheHadoopbinaries.
解决办法:
在org.apache.hadoop.util.Shell类的checkHadoopHome()方法的返回值里写固定的
本机hadoop的路径,散仙在这里更改如下:
Java代码
privatestaticStringcheckHadoopHome(){
//firstchecktheDflaghadoop.home.dirwithJVMscope
//System.setProperty("hadoop.home.dir","...");
Stringhome=System.getProperty("hadoop.home.dir");
//fallbacktothesystem/user-globalenvvariable
if(home==null){
home=System.getenv("HADOOP_HOME");
}
try{
//couldn'tfindeithersettingforhadoop'shomedirectory
if(home==null){
thrownewIOException("HADOOP_HOMEorhadoop.home.dirarenotset.");
}
if(home.startsWith("\"")&&home.endsWith("\"")){
home=home.substring(1,home.length()-1);
}
//checkthatthehomesettingisactuallyadirectorythatexists
Filehomedir=newFile(home);
if(!homedir.isAbsolute()||!homedir.exists()||!homedir.isDirectory()){
thrownewIOException("Hadoophomedirectory"+homedir
+"doesnotexist,isnotadirectory,orisnotanabsolutepath.");
}
home=homedir.getCanonicalPath();
}catch(IOExceptionioe){
if(LOG.isDebugEnabled()){
LOG.debug("Failedtodetectavalidhadoophomedirectory",ioe);
}
home=null;
}
//固定本机的hadoop地址
home="D:\\hadoop-2.2.0";
returnhome;
}
第二个异常,Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries.找不到win上的执行程序,可以去下载bin包,覆盖本机的hadoop跟目录下的bin包即可
第三个异常:
Java代码
Exceptioninthread"main"java.lang.IllegalArgumentException:WrongFS:hdfs://192.168.130.54:19000/user/hmail/output/part-00000,expected:file:///
atorg.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
atorg.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
atorg.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
atorg.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
atorg.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
atcom.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:23)
出现这个异常,一般是HDFS的路径写的有问题,解决办法,拷贝集群上的core-site.xml和hdfs-site.xml文件,放在eclipse的src根目录下即可。
第四个异常:
Java代码
Exceptioninthread"main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
出现这个异常,一般是由于HADOOP_HOME的环境变量配置的有问题,在这里散仙特别说明一下,如果想在Win上的eclipse中成功调试Hadoop2.2,就需要在本机的环境变量上,添加如下的环境变量:
(1)在系统变量中,新建HADOOP_HOME变量,属性值为D:\hadoop-2.2.0.也就是本机对应的hadoop目录
(2)在系统变量的Path里,追加%HADOOP_HOME%/bin即可
以上的问题,是散仙在测试遇到的,经过对症下药,我们的eclipse终于可以成功的调试MR程序了,散仙这里的Hellow World源码如下:
Java代码
packagecom.qin.wordcount;
importjava.io.IOException;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.JobConf;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.Mapper;
importorg.apache.hadoop.mapreduce.Reducer;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/***
*
*Hadoop2.2.0测试
*放WordCount的例子
*
*@authorqindongliang
*
*hadoop技术交流群:376932160
*
*
**/
publicclassMyWordCount{
/**
*Mapper
*
***/
privatestaticclassWMapperextendsMapper<LongWritable,Text,Text,IntWritable>{
privateIntWritablecount=newIntWritable(1);
privateTexttext=newText();
@Override
protectedvoidmap(LongWritablekey,Textvalue,Contextcontext)
throwsIOException,InterruptedException{
Stringvalues[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);
}
}
/**
*Reducer
*
***/
privatestaticclassWReducerextendsReducer<Text,IntWritable,Text,Text>{
privateTextt=newText();
@Override
protectedvoidreduce(Textkey,Iterable<IntWritable>value,Contextcontext)
throwsIOException,InterruptedException{
intcount=0;
for(IntWritablei:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);
}
}
/**
*改动一
*(1)shell源码里添加checkHadoopHome的路径
*(2)974行,FileUtils里面
***/
publicstaticvoidmain(String[]args)throwsException{
//Stringpath1=System.getenv("HADOOP_HOME");
//System.out.println(path1);
//System.exit(0);
JobConfconf=newJobConf(MyWordCount.class);
//Configurationconf=newConfiguration();
//conf.set("mapred.job.tracker","192.168.75.130:9001");
//读取person中的数据字段
//conf.setJar("tt.jar");
//注意这行代码放在最前面,进行初始化,否则会报
/**Job任务**/
Jobjob=newJob(conf,"testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模式:"+conf.get("mapred.job.tracker"));;
//job.setCombinerClass(PCombine.class);
//job.setNumReduceTasks(3);//设置为3
job.setMapperClass(WMapper.class);
job.setReducerClass(WReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
Stringpath="hdfs://192.168.46.28:9000/qin/output";
FileSystemfs=FileSystem.get(conf);
Pathp=newPath(path);
if(fs.exists(p)){
fs.delete(p,true);
System.out.println("输出路径存在,已删除!");
}
FileInputFormat.setInputPaths(job,"hdfs://192.168.46.28:9000/qin/input");
FileOutputFormat.setOutputPath(job,p);
System.exit(job.waitForCompletion(true)?0:1);
}
}
控制台,打印日志如下:
Java代码
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.job.trackerisdeprecated.Instead,usemapreduce.jobtracker.address
模式:local
输出路径存在,已删除!
INFO-Configuration.warnOnceIfDeprecated(840)|session.idisdeprecated.Instead,usedfs.metrics.session-id
INFO-JvmMetrics.init(76)|InitializingJVMMetricswithprocessName=JobTracker,sessionId=
WARN-JobSubmitter.copyAndConfigureFiles(149)|Hadoopcommand-lineoptionparsingnotperformed.ImplementtheToolinterfaceandexecuteyourapplicationwithToolRunnertoremedythis.
WARN-JobSubmitter.copyAndConfigureFiles(258)|Nojobjarfileset.Userclassesmaynotbefound.SeeJoborJob#setJar(String).
INFO-FileInputFormat.listStatus(287)|Totalinputpathstoprocess:1
INFO-JobSubmitter.submitJobInternal(394)|numberofsplits:1
INFO-Configuration.warnOnceIfDeprecated(840)|user.nameisdeprecated.Instead,usemapreduce.job.user.name
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.output.value.classisdeprecated.Instead,usemapreduce.job.output.value.class
INFO-Configuration.warnOnceIfDeprecated(840)|mapred.mapoutput.value.classisdeprecated.Instead,usemapreduce.map.output.value.class
INFO-Configuration.warnOnceIfDeprecated(840)|mapreduce.map.classisdeprecated.Instead,usemapreduce.job.map.class
INFO-C
centos 6.5怎么搭建hadoop2.7.3
总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs、mapreduce等主从关系。
1、环境,3台CentOS6.5,64位,Hadoop2.7.3需要64位Linux,操作系统十几分钟就可以安装完成,
Master 192.168.0.182
Slave1 192.168.0.183
Slave2 192.168.0.184
2、SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys
(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置,
#RSAAuthentication yes
#PubkeyAuthentication yes
(2)输入命令,ssh-keygen-t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,
(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,
cat id_rsa.pub>> authorized_keys
ssh root@192.168.0.183 cat~/.ssh/id_rsa.pub>> authorized_keys
ssh root@192.168.0.184 cat~/.ssh/id_rsa.pub>> authorized_keys
(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录
(5)完成,ssh root@192.168.0.183、ssh root@192.168.0.184就不需要输入密码了
3、安装JDK,Hadoop2.7需要JDK7,由于我的CentOS是最小化安装,所以没有OpenJDK,直接解压下载的JDK并配置变量即可
(1)下载“jdk-7u79-linux-x64.gz”,放到/home/java目录下
(2)解压,输入命令,tar-zxvf jdk-7u79-linux-x64.gz
(3)编辑/etc/profile
export JAVA_HOME=/home/java/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
(4)使配置生效,输入命令,source/etc/profile
(5)输入命令,java-version,完成
4、安装Hadoop2.7,只在Master服务器解压,再复制到Slave服务器
(1)下载“hadoop-2.7.0.tar.gz”,放到/home/hadoop目录下
(2)解压,输入命令,tar-xzvf hadoop-2.7.0.tar.gz
(3)在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name
5、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.182:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
6、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.0.182:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
7、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.0.182:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.0.182:19888</value>
</property>
</configuration>
8、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的mapred-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.0.182:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.0.182:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.0.182:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.0.182:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.0.182:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>768</value>
</property>
</configuration>
9、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了,
export JAVA_HOME=/home/java/jdk1.7.0_79
10、配置/home/hadoop/hadoop-2.7.0/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点,
192.168.0.183
192.168.0.184
11、将配置好的Hadoop复制到各个节点对应位置上,通过scp传送,
scp-r/home/hadoop 192.168.0.183:/home/
scp-r/home/hadoop 192.168.0.184:/home/
12、在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.7.0目录
(1)初始化,输入命令,bin/hdfs namenode-format
注意:执行这步的时候可能会报一个错误:
java.net.UnknownHostException: tiancunPC: tiancunPC: unknown error
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:57)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:982)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:591)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
Caused by: java.net.UnknownHostException: tiancunPC: unknown error
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
16/11/11 19:15:23 WARN net.DNS: Unable to determine address of the host-falling back to"localhost" address
java.net.UnknownHostException: tiancunPC: tiancunPC: unknown error
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:287)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:58)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:982)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:591)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:157)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
Caused by: java.net.UnknownHostException: tiancunPC: unknown error
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
linux中使用hostname查看为:
[root@tiancunPC hadoop-2.7.3]# hostname
tiancunPC
查看/etc/hosts为:
[root@tiancunPC hadoop-2.7.3]# cat/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
难怪会映射不到,修改/etc/hosts
[root@tiancunPC hadoop-2.7.3]# cat/etc/hosts
127.0.0.1 tiancunPC localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
对应修改另外两个机器的主机名,在执行那个命令就可以了
(2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh
执行sbin/start-all.sh可能会有错误提示:
maps to localhost(IP), but this does not map back to the address
解决办法:
修改/etc/ssh/ssh_config
vim/etc/ssh/ssh_config
GSSAPIAuthentication no
这个时候可能还会出现这个错误提示:
hadoop出现namenode running as process 18472. Stop it first.,hadoopnamenode
解决办法:重新启动一下hadoop
(3)停止的话,输入命令,sbin/stop-all.sh
(4)输入命令,jps,可以看到相关信息