您现在的位置 >> Hadoop教程 >> Hadoop实战 >> zookeeper专题  
 

Region Server意外退出原因分析

【作者:Hadoop实战专家】【关键词:配置 需要 】 【点击:49257次】【2013-12-2】
* called by ZooKeeper when we get an event on that ZNode. restart();对于一个reigonserver, 他需要将自己注册到Zookeeper上master的Znode上。 Zookeeper session expired will force regionserver exit.  

相关热门搜索:zookeeper java 集群

大数据标签:hadoop hbase zookeeper bigdata

nagios报出一台regionserver挂了,从log中看到这样一条信息

1. 2011-04-08 04:02:22,083 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired

复制代码

之后, regionserver就理直气壮地退出了。

于是查了下代码,看到了在org.apache.hadoop.hbase.regionserver.HRegionSever.java下这样一段代码。

1. /**

2. * We register ourselves as a watcher on the master address ZNode. This is

3. * called by ZooKeeper when we get an event on that ZNode. When this method

4. * is called it means either our master has died, or a new one has come up.

5. * Either way we need to update our knowledge of the master.

6. * @param event WatchedEvent from ZooKeeper.

7. */

8. public void process(WatchedEvent event) {

9.         EventType type = event.getType();

10.         KeeperState state = event.getState();

11.         LOG.info(“Got ZooKeeper event, state: ” + state + “, type: ” +

12.         type + “, path: ” + event.getPath());

13.         // Ignore events if we’re shutting down.

14.         if (stopRequested.get()) {

15.                 LOG.debug(“Ignoring ZooKeeper event while shutting down”);

16.                 return;

17.         }

18.         if (state == KeeperState.Expired) {

19.                 LOG.error(“ZooKeeper session expired”);

20.                 boolean restart =

21.                 this.conf.getBoolean(“hbase.regionserver.restart.on.zk.expire”, false);

22.                 if (restart) {

23.                         restart();

24.                 } else {

25.                         abort();

26.                 }

27.         } else if (type == EventType.NodeDeleted) {

28.                 watchMasterAddress();

29.         } else if (type == EventType.NodeCreated) {

30.                 getMaster();

31.                 // ZooKeeper watches are one time only, so we need to re-register our watch.

32.                 watchMasterAddress();

33.         }

34. }

复制代码

这段注释写的很清楚了。对于一个reigonserver, 他需要将自己注册到Zookeeper上master的Znode上。这样的目的,是当master 宕机或者新的master启动的时候,能及时收到通知。对于regionserver来说,维持和Zookeeper的联系是非常重要的。因为regionserver需要定期的将心跳包发给master server。如果regionserver不能及时的知道master的改变,就会导致regionserver和master失去联系,而成为一个僵死的进程。

于是,在默认情况下,regionserver遇到这种情况,就选择退出。
为什么regionserver 和Zookeeper的session expired? 可能的原因有
1. 网络不好。
2. Java full GC, 这会block所有的线程。如果时间比较长,也会导致session expired.
怎么办?
1. 将Zookeeper的timeout时间加长。
2. 配置“hbase.regionserver.restart.on.zk.expire” 为true。 这样子,遇到ZooKeeper session expired , regionserver将选择 restart 而不是 abort

具体的配置是,在hbase-site.xml中加入

1. 

2.         zookeeper.session.timeout

3.         90000

4.         ZooKeeper session timeout.

5.                 HBase passes this to the zk quorum as suggested maximum time for a

6.                 session.  See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions

7.                 “The client sends a requested timeout, the server responds with the

8.                 timeout that it can give the client. The current implementation

9.                 requires that the timeout be a minimum of 2 times the tickTime

10.                 (as set in the server configuration) and a maximum of 20 times

11.                 the tickTime.” Set the zk ticktime with hbase.zookeeper.property.tickTime.

12.                 In milliseconds.

13.         

14. 

15. 

16.         hbase.regionserver.restart.on.zk.expire

17.         true

18.         

19.                 Zookeeper session expired will force regionserver exit.

20.                 Enable this will make the regionserver restart.

21.         

22. 

复制代码
为了避免java full GC suspend thread 对Zookeeper heartbeat的影响,我们还需要对hbase-env.sh进行配置。

将
export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

修改成

export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled \-XX:+CMSInitiatingOccupancyFraction=70 \-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC -Xmn256m"

大数据系列zookeeper相关文章:

最新评论
木头2014-09-10 12:28:17
我也是查了好久没明白
雨郁2014-09-08 08:29:37
sshd 是什么
2014-09-08 02:16:35
是能够方位的
yy2014-09-07 09:58:15
我试过了,不知怎的,我电脑不行
张静聪2014-09-07 01:30:55
我已将百度懂了 多谢了
 
  • Hadoop生态系统资料推荐