hadoop的配置文件里面,不能用下划线,如果要做单词分割,可以用中画线。例如:hadoop_master改成hadoop-master用下划线会报错dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured, Does not contain a valid host:port authority
查看CentOS自带JDK是否已安装。
yum list installed |grep java。
安装和更新java
yum -y install java-1.7.0-openjdk*
设置java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.99-2.6.5.0.el7_2.x86_64
vi ~/.bash_profile
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.99-2.6.5.0.el7_2.x86_64
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport PATH=$PATH:$JAVA_HOME/binexport hadoop-PREFIX=/home/hadoop/hadoop-2.7.2
export hadoop-HOME=/home/hadoop/hadoop-2.7.2export hadoop-INSTALL=$hadoop-HOMEexport hadoop-MAPRED_HOME=$hadoop-HOMEexport hadoop-COMMON_HOME=$hadoop-HOMEexport hadoop-HDFS_HOME=$hadoop-HOMEexport YARN_HOME=$hadoop-HOMEexport hadoop-COMMON_LIB_NATIVE_DIR=$hadoop-HOME/lib/nativeexport PATH=$PATH:$hadoop-HOME/sbin:$hadoop-HOME/bin
yum install ssh
yum install rsync
useradd hadoop
passwd hadoop
修改机器名
步骤1:
修改/etc/sysconfig/network中的hostname
vi /etc/sysconfig/networkHOSTNAME=localhost.localdomain #修改localhost.localdomain为orcl1
修改network的HOSTNAME项。点前面是主机名,点后面是域名。没有点就是主机名。
centos 7修改 vi /etc/hostname
这个是永久修改,重启后生效。
步骤2:
修改/etc/hosts文件vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain610.20.77.172 hadoop-master10.20.77.173 hadoop-slave110.20.77.174 hadoop-slave210.20.77.175 hadoop-slave3shutdown -r now #最后,重启服务器即可。
配置SSH的无密码登录:可新建专用用户hadoop进行操作,cd命令进入所属目录下,输入以下指令(已安装ssh)
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsacat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keyschmod 0600 ~/.ssh/authorized_keys
解释一下,第一条生成ssh密码的命令,-t 参数表示生成算法,有rsa和dsa两种;-P表示使用的密码,这里使用“”空字符串表示无密码。
第二条命令将生成的密钥写入authorized_keys文件。
这时输入 ssh localhost,弹出写入提示后回车,便可无密码登录本机。同理,将authorized_keys文件 通过 scp命令拷贝到其它主机相同目录下,则可无密码登录其它机器。
scp /home/hadoop/.ssh/authorized_keys hadoop@hadoop-slave1:/home/hadoop/在hadoop-slave1cd /home/hadoop/cat authorized_keys >> .ssh/authorized_keys
su - hadoop
cd /home/hadoop
mkdir tmp
mkdir hdfs
mkdir hdfs/data
mkdir hdfs/name
wget http://apache.fayea.com/hadoop/common/stable/hadoop-2.7.2.tar.gz
解压到/home/hadoop/hadoop-2.7.2/
修改/home/hadoop/hadoop-2.7.2/etc/hadoop/core-site.xml
完整文件如下
这个是hadoop的核心配置文件,这里需要配置的就这两个属性,fs.default.name配置了hadoop的HDFS系统的命名,位置为主机的9000端口;hadoop.tmp.dir配置了hadoop的tmp目录的根位置。这里使用了一个文件系统中没有的位置,所以要先用mkdir命令新建一下。
hdfs-site.xml
dfs.namenode.name.dir file:/home/hadoop/name dfs.datanode.data.dir file:/home/hadoop/data dfs.replication 3 dfs.http.address hadoop-master:50070 dfs.namenode.secondary.http-address hadoop-master:9001 dfs.webhdfs.enabled true
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
mapreduce.framework.name yarn mapreduce.jobhistory.address hadoop-master:10020 mapreduce.jobhistory.webapp.address hadoop-master:19888
修改yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.auxservices.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address hadoop-master:8032 yarn.resourcemanager.scheduler.address hadoop-master:8030 yarn.resourcemanager.resource-tracker.address hadoop-master:8031 yarn.resourcemanager.admin.address hadoop-master:8033 yarn.resourcemanager.webapp.address hadoop-master:8088 yarn.nodemanager.resource.memory-mb 768
vi slaves
hadoop-slave1hadoop-slave2hadoop-slave3
其它机器配置
yum install sshyum install rsyncvi /etc/hosts10.20.77.172 hadoop-master10.20.77.173 hadoop-slave110.20.77.174 hadoop-slave210.20.77.175 hadoop-slave3useradd hadooppasswd hadoop再从master机器拷贝hadoop过来scp -r /home/hadoop/* hadoop@hadoop-slave1:/home/hadoop/
格式化namenode:
./bin/hdfs namenode -format
启动hdfs: ./sbin/start-dfs.sh
此时在Master上面运行的进程有:namenode secondarynamenode
Slave上面运行的进程有:datanode
启动yarn: ./sbin/start-yarn.sh
此时在Master上面运行的进程有:namenode secondarynamenode resourcemanager
Slave上面运行的进程有:datanode nodemanager
mr-jobhistory-daemon.sh start historyserver
现在多了一个JobHistoryServer
检查启动结果
查看集群状态:./bin/hdfs dfsadmin –report查看文件块组成: ./bin/hdfs fsck / -files -blocks查看HDFS: http://10.20.77.172:50070查看RM: http:// 10.20.77.172:8088
管理端口
1、HDFS页面:500702、YARN的管理界面:80883、HistoryServer的管理界面:198884、Zookeeper的服务端口号:21815、Mysql的服务端口号:33066、Hive.server1=100007、Kafka的服务端口号:90928、azkaban界面:84439、Hbase界面:16010,6001010、Spark的界面:808011、Spark的URL:7077
Daemon Web Interface NotesNameNode http://nn_host:port/ Default HTTP port is 50070.ResourceManager http://rm_host:port/ Default HTTP port is 8088.MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
参考资料:
http://jingyan.baidu.com/article/27fa73269c02fe46f9271f45.html
http://www.powerxing.com/install-hadoop/
http://www.powerxing.com/install-hadoop-cluster/
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html