Hadoop分布式部署

简介

本文将介绍分布式部署hadoop方式,各角色部署使用独立的用户

  • HDFS 部署
  • httpfs 部署
  • yarn 部署
  • hive 部署
  • HBase 部署

环境准备

计划

hostname 作用
node1 namenode, datanode, JournalNode, zookeeper
node2 namenode, datanode, JournalNode, zookeeper
node3 datanode, JournalNode, zookeeper, httpfs

安装步骤

配置zookeeper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 添加zookeeper用户
useradd zookeeper


# 解压并安装zookeeper包
tar -xvf zookeeper-3.4.12.tar.gz
mv zookeeper-3.4.12 /opt/
chown root:root /opt/zookeeper-3.4.12/ -R
ln -vs /opt/zookeeper-3.4.12/ /opt/zookeeper

# 创建配置文件
cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg

# 创建日志和数据目录
mkdir /var/log/zookeeper
chown zookeeper:zookeeper /var/log/zookeeper/

mkdir /var/lib/zookeeper
chown zookeeper:zookeeper /var/lib/zookeeper/

在node1上执行:echo 1 > /var/lib/zookeeper/myid
在node2上执行:echo 2 > /var/lib/zookeeper/myid
在node3上执行:echo 3 > /var/lib/zookeeper/myid

执行vim /opt/zookeeper/conf/zoo.cfg 底部添加如下内容

1
2
3
4
dataDir=/var/lib/zookeeper
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

执行vim /opt/zookeeper/bin/zkEnv.sh 修改如下内容

1
2
3
4
if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO_LOG_DIR="/var/log/zookeeper"
fi

在node1, node2, node3上启动zookeeper

1
2
3
4
# 启动
su zookeeper -c "/opt/zookeeper/bin/zkServer.sh start"
# 停止
su zookeeper -c "/opt/zookeeper/bin/zkServer.sh stop"

SSH 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 创建用户组hadoop
groupadd hadoop

# 创建用户hdfs并设置用户组为hadoop
useradd hdfs -g hadoop

# 设置hdfs密码
passwd hdfs

su hdfs

# 一路回车
ssh-keygen

# 将本主机的公钥分发至node1
ssh-copy-id node1

# 使用ssh-copy-id将:
# node1, node2的公钥分发至node1, node2, node3
# node3的公钥分发至node3

配置HDFS

切换至root用户

  1. 安装和配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # 解压hadoop安装包
    tar -xvf hadoop-2.6.5.tar.gz

    mv hadoop-2.6.5 /opt/

    chown root:root /opt/hadoop-2.6.5 -R

    ln -vs /opt/hadoop-2.6.5/ /opt/hadoop

    mkdir /var/log/hadoop-hdfs
    chown hdfs:hadoop /var/log/hadoop-hdfs

    mkdir /var/lib/hadoop-hdfs
    chown hdfs:hadoop /var/lib/hadoop-hdfs

以下修改hdfs相关配置,配置文件位于目录/opt/hadoop/ect/hadoop

执行vim hadoop-env.sh 修改如下对应配置

1
2
3
export HADOOP_LOG_DIR=/var/log/hadoop-hdfs
# 此处注意修改为环境中对应的JAVA_HOME路径
export JAVA_HOME=/usr/java/jdk1.8.0_171

执行vim core-site.xml,其中<configuration>节点修改为如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>file:/var/lib/hadoop-hdfs</value>
</property>

<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>

</configuration>

执行vim hdfs-site.xml,其中<configuration>节点修改为如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
<configuration>

<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>

<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>node1,node2</value>
</property>

<property>
<name>dfs.namenode.rpc-address.mycluster.node1</name>
<value>node1:8020</value>
</property>

<property>
<name>dfs.namenode.http-address.mycluster.node1</name>
<value>node1:50070</value>
</property>

<property>
<name>dfs.namenode.rpc-address.mycluster.node2</name>
<value>node2:8020</value>
</property>

<property>
<name>dfs.namenode.http-address.mycluster.node2</name>
<value>node2:50070</value>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/journal</value>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/lib/hadoop-hdfs</value>
</property>

<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/lib/hadoop-hdfs/nn</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/lib/hadoop-hdfs/dn</value>
</property>

</configuration>

执行vim slaves 替换为如下内容

1
2
3
node1
node2
node3

执行cp mapred-site.xml.template mapred-site.xml 复制mapred-site.xml.templatemapred-site.xml
执行vim mapred-site.xml,其中<configuration>节点修改为如下内容

1
2
3
4
5
6
7
8
<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>

  1. 启动journalnode
    1
    2
    3
    su hdfs -c "/opt/hadoop/sbin/hadoop-daemon.sh start journalnode"
    # 对应停止命令如下
    # su hdfs -c "/opt/hadoop/sbin/hadoop-daemon.sh stop journalnode"

使用jps可以查到JournalNode进程
如果无法使用jps则需要先导入java环境变量,例如使用source /etc/profile

  1. 初始化namenode

    1
    2
    3
    4
    5
    # node1上执行
    su hdfs -c "/opt/hadoop/bin/hdfs namenode -format"
    su hdfs -c "scp -r /var/lib/hadoop-hdfs/nn hdfs@node2:/var/lib/hadoop-hdfs"
    su hdfs -c "/opt/hadoop/bin/hdfs zkfc -formatZK"
    su hdfs -c "/opt/hadoop/sbin/start-dfs.sh"

    打开http://node1:50070http://node2:50070查看hdfs信息

    配置HttpFs

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # 创建用户
    useradd httpfs

    # 创建目录
    mkdir /var/log/hadoop-httpfs
    chown httpfs:httpfs /var/log/hadoop-httpfs/

    mkdir /var/tmp/hadoop-httpfs
    chown httpfs:httpfs /var/tmp/hadoop-httpfs/

    # 设置权限
    chmod +r /opt/hadoop/share/hadoop/httpfs/tomcat/conf -R

    vim httpfs-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    <configuration>

    <property>
    <name>httpfs.proxyuser.hue.hosts</name>
    <value>*</value>
    </property>

    <property>
    <name>httpfs.proxyuser.hue.groups</name>
    <value>*</value>
    </property>

    </configuration>

    vim core-site.xml 添加

    1
    2
    3
    4
    5
    6
    7
    8
    9
    <property>
    <name>hadoop.proxyuser.httpfs.hosts</name>
    <value>*</value>
    </property>

    <property>
    <name>hadoop.proxyuser.httpfs.groups</name>
    <value>*</value>
    </property>

    vim httpfs-env.sh

    1
    2
    export HTTPFS_LOG=/var/log/hadoop-httpfs
    export HTTPFS_TEMP=/var/tmp/hadoop-httpfs

    启动或停止

    1
    2
    su httpfs -c "/opt/hadoop/sbin/httpfs.sh start"
    su httpfs -c "/opt/hadoop/sbin/httpfs.sh stop"

以下为本文档中所使用的docker容器构建和编排相关的文档,如果需要使用容器部署hadoop,可以参考使用

Dockerfile

构建centos7-base镜像所使用的相关文件如下

目录结构

1
2
3
├── Dockerfile
└── files
└── jdk-8u171-linux-x64.tar.gz

jdk下载地址:
jdk-8u171-linux-x64.tar.gz

Dockerfile内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
FROM centos:7.2.1511

MAINTAINER yancai
ENV TZ Asia/Shanghai

# 设置yum源 安装常用工具
RUN curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo \
&& yum makecache \
&& yum install -y vim mlocate telnet zlib-devel openssh-server openssh-clients net-tools \
&& mkdir /usr/java

WORKDIR /root

# 安装jdk
ADD files/jdk-8u171-linux-x64.tar.gz /usr/java
RUN echo "export JAVA_HOME=/usr/java/jdk1.8.0_171" >> /etc/profile \
&& echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /etc/profile \
&& echo "export CLASSPATH=.:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar" >> /etc/profile \
&& ln -vs /usr/java/jdk1.8.0_171/bin/java /usr/bin/java

Dockerfile所在目录使用sudo docker build --tag="centos7-base" .构建镜像

docker-compose.yml

以下为docker-compose.yml内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
version: '2.2'
services:
node1:
image: centos7-base
container_name: node1
hostname: node1
environment:
- TZ=Asia/Shanghai
command: tail -f /dev/null
volumes:
- ~/share:/root/share

node2:
image: centos7-base
container_name: node2
hostname: node2
environment:
- TZ=Asia/Shanghai
command: tail -f /dev/null
volumes:
- ~/share:/root/share

node3:
image: centos7-base
container_name: node3
hostname: node3
environment:
- TZ=Asia/Shanghai
command: tail -f /dev/null
volumes:
- ~/share:/root/share

docker-compose.yml所在目录使用命令sudo docker-compose up -d启动容器

使用docker容器部署时,需要开启SSH服务,手动开启方式如下:

1
2
3
# 在容器内执行
ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
/usr/sbin/sshd

0%