hbase hmaster启动起来就自动关闭的问题解决成功经验分享

最近对系统进行了升级,yum update,升级之后发现jps命令用不了了,最终找到问题是jps和java的连接全部失效,手动更改位置之后jps能用,但hbase还是调用原来的位置,一个个改太繁琐了,用ln jps /usr/lib/jvm/java/bin/jps重建连接之后发现该问题解决,但另几台机子还是不行,用yum remove java-1.8.0-openjdk*和yum install java-1.8.0-openjdk*重装java之后一切问题解决。还有几台机子yum update升级之后并没有产生这个问题,一切正常,不知道为什么,都是从centos6.5升级到centos6.7。

之后运行hbase发现hbase报一堆错误,大概意思就是zookeeper无法连接主机,在主机上看了下jps,发现没有hmaster这个进程,用xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh start master启动hmaster之后,用jps看到hmaster进程,但瞬间再次用jps看的时候hmaster已经自动关闭了。

查看日志cat /root/xyhadoop/hbase-1.0.1.1/bin/../logs/hbase-root-master-192-168-137-2.out

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/xyhadoop/hbase-1.0.1.1/lib/phoenix-4.7.0-HBase-1.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/xyhadoop/hbase-1.0.1.1/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/xyhadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
OpenJDK 64-Bit Server VM warning: You have loaded library /root/xyhadoop/hadoop-2.7.1/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
<strong>0    [main] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper  - ZooKeeper create failed after 4 attempts</strong>
<strong>3    [main] ERROR org.apache.hadoop.hbase.master.HMasterCommandLine  - Master exiting</strong>
<strong>java.lang.RuntimeException: Failed construction of Master</strong>: class org.apache.hadoop.hbase.master.HMaster
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1988)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:203)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2002)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:512)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:491)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1256)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1234)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:174)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.&lt;init&gt;(ZooKeeperWatcher.java:167)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.&lt;init&gt;(HRegionServer.java:531)
        at org.apache.hadoop.hbase.master.HMaster.&lt;init&gt;(HMaster.java:333)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1983)
        ... 5 more

看我加粗的部分,尝试4次创建zookeeper都失败,hmaster已经存在,java运行时异常,无法构建master。

hmaster已经存在的错误提示让我非常不解,jps查看,明明已经自动关闭了,netstat查看也没有发现该进程和相应的端口被占用,上网搜索,说是重新格式化namenode或者可能是因为各节点时间或者数据不同步造成的,用ntp同步时间之后问题依旧,将hdfs上的数据备份下来,重建hdfs再拷贝上去,动静太大,没有尝试。

后来,我突然想到,既然错误提示hmaster已经存在,那我不如先关闭再打开试试,

[root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh stop master
no master to stop because kill -0 of pid 4383 failed with status 1
[root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh stop master
no master to stop because no pid file /tmp/hbase-root-master.pid

注意看两次的提示,第一次提示pid4388 kill失败,第二次提示pid文件没有找到。那说明第一次的关闭master操作虽然没有成功,但却把相应的pid文件删掉了,第二次还提供了该文件所在的位置。

此时启动hmaster看看:

[root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh start master
starting master, logging to /root/xyhadoop/hbase-1.0.1.1/bin/../logs/hbase-root-master-192-168-137-2.out
[root@192-168-137-2 ~]# jps
2048 AmbariServer
5124 HMaster
4309 HQuorumPeer
3595 SecondaryNameNode
5197 Jps
3758 ResourceManager
3374 NameNode
[root@192-168-137-2 ~]# jps
2048 AmbariServer
5124 HMaster
4309 HQuorumPeer
5433 Jps
3595 SecondaryNameNode
3758 ResourceManager
3374 NameNode

成功启动,并且jps看到进程hmaster存在,稍等一会再jps看看,发现hmaster还在,进入hbase shell,一切正常,list看下,表全部列出,至此,问题解决,将过程记录下来分享经验给大家。

参考链接:
http://www.aboutyun.com/thread-5882-1-1.html
http://www.aboutyun.com/thread-5883-1-1.html
http://blog.chinaunix.net/xmlrpc.php?r=blog/article&id=4008535&uid=26275986

《hbase hmaster启动起来就自动关闭的问题解决成功经验分享》有一个想法

评论已关闭。