IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    [原]hadoop 2.04测试环境搭建

    book_mmicky发表于 2014-05-13 15:20:51
    love 0
    1:规划
    oracle linux6.4上搭建hadoop2.0环境
    192.168.100.171 linux1 (namenode)
    192.168.100.172 linux2 (预留当namenode)
    192.168.100.173 linux3 (datanode)
    192.168.100.174 linux4 (datanode)
    192.168.100.175 linux5 (datanode)

    2:创建VMware Workstation样板机
    a:安装oracle linux 6.4虚拟机linux1,开通ssh服务,屏蔽iptables服务
    [root@linux1 ~]# chkconfig sshd on
    [root@linux1 ~]# chkconfig iptables off
    [root@linux1 ~]# chkconfig ip6tables off
    [root@linux1 ~]# chkconfig postfix off

    b:关闭虚拟机linux1,增加一个新的硬盘到共享目录作为共享硬盘用(使用SCSI1:0接口),
    修改linux1.vmx,添加和修改参数:
    disk.locking="FALSE"    
    diskLib.dataCacheMaxSize = "0"
    disk.EnableUUID = "TRUE"
    scsi1.present = "TRUE"
    scsi1.sharedBus = "Virtual"
    scsi1.virtualDev = "lsilogic"

    c:重启虚拟机linux1,下载JAVA到共享硬盘,安装JAVA,在环境变量配置文件/etc/profile末尾增加:
    JAVA_HOME=/usr/java/jdk1.7.0_21; export JAVA_HOME
    JRE_HOME=/usr/java/jdk1.7.0_21/jre; export JRE_HOME
    CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar; export CLASSPATH
    PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH; export PATH
    ************************************************************************
    为了方便,配置hadoop环境变量到/etc/profile或hadoop用户~/.bashrc
    export HADOOP_PREFIX=/app/hadoop204
    export PATH=$PATH:$HADOOP_PREFIX/bin
    export PATH=$PATH:$HADOOP_PREFIX/sbin
    export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
    export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
    export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
    export YARN_HOME=${HADOOP_PREFIX}
    ************************************************************************

    d:修改/etc/hosts,增加:
    192.168.100.171 linux1
    192.168.100.172 linux2
    192.168.100.173 linux3
    192.168.100.174 linux4
    192.168.100.175 linux5

    e:修改/etc/sysconfig/selinux
    SELINUX=disabled

    f:增加hadoop用户及安装hadoop文件:
    [root@linux1 ~]# useradd hadoop -g root
    [root@linux1 ~]# passwd hadoop
    [root@linux1 ~]# cd /
    [root@linux1 /]# mkdir /app
    [root@linux1 /]# cd /app
    [root@linux1 app]# tar -zxf /mnt/mysoft/LinuxSoft/hadoop-2.0.4-alpha.tar.gz
    [root@linux1 app]# mv hadoop-2.0.4-alpha hadoop204
    [root@linux1 app]# chown hadoop:root -R /app/hadoop204
    [root@linux1 hadoop204]# su - hadoop
    [hadoop@linux1 ~]$ cd /app/hadoop204
    [hadoop@linux1 hadoop204]$ mkdir tmp

    g:修改hadoop相关配置文件:

    [hadoop@linux1 hadoop204]$ cd etc/hadoop
    [hadoop@linux1 hadoop]$ vi core-site.xml
    ******************************************************************************
    <configuration>
    <property>
    <name>io.native.lib.available</name>
    <value>true</value>
    </property>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://linux1:9000</value>
    <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS. </description>
    <final>true</final>
    </property>
    </configuration>
    ******************************************************************************

    [hadoop@linux1 hadoop]$ vi hdfs-site.xml
    ******************************************************************************
    <configuration>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/app/hadoop204/dfs/name</value>
    <description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description>
    <final>true</final>
    </property>

    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/app/hadoop204/dfs/data</value>
    <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
    <final>true</final>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>

    <property>
    <name>dfs.permissions</name>
    <value>false</value>
    </property>
    </configuration>
    ******************************************************************************

    [hadoop@linux1 hadoop]$ vi mapred-site.xml
    ******************************************************************************
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

    <property>
    <name>mapreduce.job.tracker</name>
    <value>hdfs://linux1:9001</value>
    <final>true</final>
    </property>

    <property>
    <name>mapred.system.dir</name>
    <value>file:/app/hadoop204/mapred/system</value>
    <final>true</final>
    </property>

    <property>
    <name>mapred.local.dir</name>
    <value>file:/app/hadoop204/mapred/local</value>
    <final>true</final>
    </property>
    </configuration>
    ******************************************************************************

    [hadoop@linux1 hadoop]$ vi yarn-site.xml
    ******************************************************************************
    <configuration>
    <property>
    <name>yarn.resourcemanager.address</name>
    <value>linux1:8080</value>
    </property>

    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>linux1:8081</value>
    </property>

    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>linux1:8082</value>
    </property>

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    </property>

    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    </configuration>
    ******************************************************************************

    [hadoop@linux1 hadoop]$ vi hadoop-env.sh
    ******************************************************************************
    export JAVA_HOME=/usr/java/jdk1.7.0_21
    export HADOOP_FREFIX=/app/hadoop204
    export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
    export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
    export PATH=$PATH:$HADOOP_FREFIX/bin
    export PATH=$PATH:$HADOOP_FREFIX/sbin
    export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
    export YARN_HOME=${HADOOP_FREFIX}
    export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
    ******************************************************************************

    [hadoop@linux1 hadoop]$ vi yarn-env.sh
    ******************************************************************************
    export JAVA_HOME=/usr/java/jdk1.7.0_21
    export HADOOP_FREFIX=/app/hadoop204
    export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
    export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
    export PATH=$PATH:$HADOOP_FREFIX/bin
    export PATH=$PATH:$HADOOP_FREFIX/sbin
    export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
    export YARN_HOME=${HADOOP_FREFIX}
    export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
    ******************************************************************************

    h:配置ssh使用证书验证/etc/ssh/sshd_config,打开注释:
    RSAAuthentication yes
    PubkeyAuthentication yes
    AuthorizedKeysFile .ssh/authorized_keys


    3:配置ssh
    a:关闭样板机,分别复制成linux2、linux3、linux4、linux5:
    修改vmware workstation配置文件的displayname;
    修改虚拟机的下列文件中相关的信息
    /etc/udev/rules.d/70-persistent-net.rules
    /etc/sysconfig/network
    /etc/sysconfig/network-scripts/ifcfg-eth0

    b:启动linux1、linux2、linux3、linux4、linux5,确保相互之间能ping通。

    c:配置ssh,确保linux1能无验证访问其他节点
    [root@linux1 tmp]# su - hadoop
    [hadoop@linux1 ~]$ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
    Created directory '/home/hadoop/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    17:37:98:fa:7e:5c:e4:8b:b4:7e:bb:59:28:8f:45:bd hadoop@linux1
    The key's randomart image is:
    +--[ RSA 2048]----+
    |                 |
    |           o     |
    |          + o    |
    |         . o ... |
    |        S .  o. .|
    |         o  ..o..|
    |          .o.+oE.|
    |         .  ==oo |
    |          .oo.=o |
    +-----------------+
    [hadoop@linux1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@linux1
    [hadoop@linux1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@linux2
    [hadoop@linux1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@linux3
    [hadoop@linux1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@linux4
    [hadoop@linux1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@linux5

    验证可否无密码访问:
    [hadoop@linux1 ~]$ ssh linux1 date
    [hadoop@linux1 ~]$ ssh linux2 date
    [hadoop@linux1 ~]$ ssh linux3 date
    [hadoop@linux1 ~]$ ssh linux4 date
    [hadoop@linux1 ~]$ ssh linux5 date


    4:初始化hadoop
    [hadoop@linux1 hadoop204]$ /app/hadoop204/bin/hdfs namenode -format

    5:配置linux1的slaves
    [hadoop@linux1 hadoop204]$ vi etc/hadoop/slaves
    192.168.100.173
    192.168.100.174
    192.168.100.175

    6:启动hadoop
    [hadoop@linux1 hadoop204]$ /app/hadoop204/sbin/start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    13/06/11 10:08:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [linux1]
    linux1: starting namenode, logging to /app/hadoop204/logs/hadoop-hadoop-namenode-linux1.out
    192.168.100.174: starting datanode, logging to /app/hadoop204/logs/hadoop-hadoop-datanode-linux4.out
    192.168.100.175: starting datanode, logging to /app/hadoop204/logs/hadoop-hadoop-datanode-linux5.out
    192.168.100.173: starting datanode, logging to /app/hadoop204/logs/hadoop-hadoop-datanode-linux3.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop204/logs/hadoop-hadoop-secondarynamenode-linux1.out
    13/06/11 10:08:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    starting yarn daemons
    starting resourcemanager, logging to /app/hadoop204/logs/yarn-hadoop-resourcemanager-linux1.out
    192.168.100.174: starting nodemanager, logging to /app/hadoop204/logs/yarn-hadoop-nodemanager-linux4.out
    192.168.100.175: starting nodemanager, logging to /app/hadoop204/logs/yarn-hadoop-nodemanager-linux5.out
    192.168.100.173: starting nodemanager, logging to /app/hadoop204/logs/yarn-hadoop-nodemanager-linux3.out

    7:修改linux启动到console状态。以上只是初步测试,虽然http://192.168.100.171:8088可以访问,但还有许多问题:
    启动中的Unable to load native-hadoop library for your platform错误,可以参见:
    Hadoop本地库与系统版本不一致引起的错误解决方法
    HADOOP的本地库(NATIVE LIBRARIES)介绍
    192.168.100.172 namenode配置,可以参见:
    Hadoop 0.23.0初探
    另经测试/app/hadoop204/etc/hadoop/hadoop-env.sh和/app/hadoop204/etc/hadoop/yarn-env.sh中的JAVA_HOME配置需要绝对路径,不然即使/etc/profile中已经配置了JAVA_HOME还是会报错。

    路还很长。。。。


沪ICP备19023445号-2号
友情链接