hadoop的HA分为hdfs的和yarn的
| NN1 | NN2 | DN | ZK | ZKFC | JNN |
node101 | √ |
|
|
| √ | √ |
node102 |
| √ | √ | √ | √ | √ |
node103 |
|
| √ | √ |
| √ |
node104 |
|
| √ | √ |
|
|
<configuration> <!-- 把两个NameNode)的地址组装成一个集群mycluster --> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property>
<!-- 指定hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/ha/hadoop-2.7.2/data/tmp</value> </property> </configuration> |
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property>
<!-- 集群中NameNode节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property>
<!-- nn1的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value> node101:9000</value> </property>
<!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node102:9000</value> </property>
<!-- nn1的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value> node101:50070</value> </property>
<!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value> node102:50070</value> </property>
<!-- 指定NameNode元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node101:8485;node102:8485;node103:8485/mycluster</value> </property>
<!-- 声明journalnode服务器存储目录--> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/module/hadoop-2.7.2/data/jn</value> </property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property>
<!-- 使用隔离机制时需要ssh无秘钥登录--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property>
<!-- 关闭权限检查--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property>
<!-- 访问代理类:client,mycluster,active配置失败自动切换实现方式--> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration> |
(1) 这个配文是要被管理脚本读取的,管理脚本读取后会开启JN节点然后将JN同2个NN联系
(2) <value>后面的/mycluster是在zk注册的节点(文件夹)的前缀,因为zk不仅只给一个集群用,为了防止不同集群信息覆盖,要给每个集群设置唯一的名字
(3) JN就是journal node
(1) 免密钥2个需求场景
1st 管理节点设置其他所有被管理节点的免密钥
2nd 2个NN之间相互设置,为了FC能相互调用降级方法
(1) 格式化意味着重新构建一个集群,会有新的clusterID,会将已有的元数据(data和logs目录)清除,创建空的fsimage和edits,version,clusterID
(1) 设置了editslog的share目录,editslog存在JN集群种,需要先启动JN
(2) 先启动一个NN,然后通过JN同步另外一个NN==》另外一个NN不需要格式化,也获取了集群的信息
(3) DN启动时,从ANN获取集群ID
sbin/hadoop-daemon.sh start journalnode
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemons.sh start datanode
bin/hdfs haadmin -transitionToActive nn1
bin/hdfs haadmin -getServiceState nn1
Usage: haadmin
[-transitionToActive <serviceId>]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
[-help <command>]
export JAVA_HOME=/opt/module/jdk
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$PATH
(1) dataDir,唯一标识,ip,投票选举端口
dataDir=/var/bd/hadoop/zk |
|
server.1=node102:2888:3888 server.2=node103:2888:3888 server.3=node104:2888:3888 | 此处是node2、3、4作为zk节点,所以从102开始 |
(2) logDir
<configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property>
<!-- 指定Hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node102:2181,node103:2181,node104:2181</value> </property>
</configuration> |
<<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property>
<!-- 集群中NameNode节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property>
<!-- nn1的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>node101:9000</value> </property>
<!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node102:9000</value> </property>
<!-- nn1的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>node101:50070</value> </property>
<!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>node102:50070</value> </property>
<!-- 指定NameNode元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node101:8485;node102:8485;node103:8485/mycluster</value> </property>
<!-- 声明journalnode服务器存储目录--> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/module/hadoop-2.7.2/data/jn</value> </property>
<!--启动自动代理--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property>
<!-- 使用隔离机制时需要ssh无秘钥登录--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property>
<!-- 关闭权限检查--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property>
<!-- 访问代理类:client,mycluster,active配置失败自动切换实现方式--> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration> |
# 格式化集群的 NameNode (在 node-1 上执行) $ hdfs namenode -format
# 启动刚格式化的 NameNode (在 node-1 上执行) $ hdfs-daemon start namenode
# 同步 NameNode1 元数据到 NameNode2 上 (在 node-2 上执行) $ hdfs namenode -bootstrapStandby
# 启动 NameNode2 (在 node-2 上执行) $ hdfs-daemon start namenode |
http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
hadoop102 | hadoop103 | hadoop104 |
NameNode | NameNode |
|
JournalNode | JournalNode | JournalNode |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
ResourceManager | ResourceManager |
|
NodeManager | NodeManager | NodeManager |
|
(1)在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode
(2)在[nn1]上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
(3)在[nn2]上,同步nn1的元数据信息:
bin/hdfs namenode -bootstrapStandby
(4)启动[nn2]:
sbin/hadoop-daemon.sh start namenode
(5)启动所有DataNode
sbin/hadoop-daemons.sh start datanode
(6)将[nn1]切换为Active
bin/hdfs haadmin -transitionToActive nn1
(1)在hadoop102中执行:
sbin/start-yarn.sh
(2)在hadoop103中执行:
sbin/yarn-daemon.sh start resourcemanager
(3)查看服务状态,如图3-24所示
bin/yarn rmadmin -getServiceState rm1