Step 3 of explanation about what happens after Full GC may be vague. It confuses audiences by

ServerManager on HMaster finds there must be something wrong with this RegionServer

How does HMaster detect the death of RegionServer?

Let’s dive into the logs:

It’s effortless to conclude that HMaster get the failure notification from ZooKeeper, maintaining which servers are alive and available.

2018-07-04 16:01:32,016 INFO [main-EventThread] zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [xs319,60020,1530601168001]

Heartbeats transmit not only between ZooKeeper and HMaster, but also between ZooKeeper and RegionServer. RegionServer and the active HMaster connect with a session to ZooKeeper.

An ephemeral node is created for each RegionServer on ZooKeeper, which is monitored by HMaster to verdict whether this RegionServer is alive or not. HMaster also detect server failures by tracking this node.

HMasters also need to create an ephemeral node. ZooKeeper determines the first one and uses it to make sure that only one master is active. The active HMaster sends heartbeats to ZooKeeper, and the inactive HMaster listens for notifications of the active HMaster failure.

If a RegionServer fails to send a heartbeat, the session is expired and the corresponding ephemeral node is deleted. Listeners for updates will be notified of the deleted nodes. The active HMaster listens for region servers, and will recover region servers on failure.

RegionServerTracker tracks the online region servers via ZK. It handles listening for changes in the RS node list and watching each node via RPC.

According the log RegionServer ephemeral node deleted, processing expiration, we can locate the method named nodeDeleted in the following.


public void nodeDeleted(String path) {
  if (path.startsWith(watcher.znodePaths.rsZNode)) {
    String serverName = ZKUtil.getNodeName(path);"RegionServer ephemeral node deleted, processing expiration [" +
      serverName + "]");
    ServerName sn = ServerName.parseServerName(serverName);
    if (!serverManager.isServerOnline(sn)) {
      LOG.warn(serverName.toString() + " is not online or isn't known to the master."+
       "The latter could be caused by a DNS misconfiguration.");

If a RegionServer node gets deleted, this automatically handles calling of ServerManager#expireServer(ServerName).

The only usage of nodeDeleted occurs in ZKWatcher, one instance of which is instantiated for each HMaster, RegionServer, and client process. This class also holds and manages the connection to ZooKeeper. Code to deal with connection related events and exceptions are handled here.


public void process(WatchedEvent event) {
    switch(event.getType()) {
      // ...  
      case NodeDeleted: {
        for(ZKListener listener : listeners) {

ZKWatcher receives valid events from ZooKeeper, including NodeDeleted and passes along to listeners, one of which is RegionServerTracker.