The entrypoint function of the DFSRouter process is as follows:
org.apache.hadoop.hdfs.server.federation.router.DFSRouter#main
public static void main(String[] argv) {
if (DFSUtil.parseHelpArgument(argv, USAGE, System.out, true)) {
System.exit(0);
}
try {
StringUtils.startupShutdownMessage(Router.class, argv, LOG);
Router router = new Router();
ShutdownHookManager.get().addShutdownHook(
new CompositeServiceShutdownHook(router), SHUTDOWN_HOOK_PRIORITY);
Configuration conf = new HdfsConfiguration();
router.init(conf);
router.start();
} catch (Throwable e) {
LOG.error("Failed to start router", e);
terminate(1, e);
}
}
1. Router Initialization
org.apache.hadoop.hdfs.server.federation.router.Router#serviceInit
1) If routerHeartbeatService is not null, update the status of the router in the State Store to INITIALIZING.
2) Create a StateStoreService which is connected to the State Store.
3) Create an ActiveNamenodeResolver and parse the active NameNode of the NS.
4) Create a FileSubclusterResolver and parse the full and local paths.
5) Create a RPCServer to process requests from clients.
6) Create an AdminServer to maintain the mounted table (dfsrouteradmin).
7) Create an HttpServer to display the native page and Java management extensions (JMX).
8) Create a NamenodeHeartbeatService and an NameNode heartbeat service to periodically collect information about the NameNode status, and report the status to the State Store. For all NameNodes detected by the current router, create a NamenodeHeartbeatService for each NameNode.
9) Create a RouterHeartbeatService and a router heartbeat service to periodically report the router status to the State Store.
10) Create a RouterMetricsService to monitor required indicators.
11) Create quota-related services RouterQuotaManager and RouterQuotaUpdateService to process configuration information about the mounted table.
12) Create a RouterSafemodeService. If the router is in the safemode state, all RPC requests are rejected.
2. Router Start-up
org.apache.hadoop.hdfs.server.federation.router.Router#serviceStart
1) Update the status of the router in the State Store to RUNNING.
2) Start the pauseMonitor.
3) Start services added during the router initialization.
3. StateStoreService
3.1. Related Parameters
dfs.federation.router.store.connection.test
This parameter indicates the interval for the router to connect to the State Store. The default value is 1 minute.
dfs.federation.router.store.membership.expiration
If the status of a record in membershipStore is not updated within this interval, the current NameNode expires and is unavailable. The default interval is 5 minutes (five times of the heartbeat time).
dfs.federation.router.store.router.expiration
If the status of a router in the router store is not updated within this interval, the router is marked as in security mode and unavailable. The default interval is 5 minutes (five times of the heartbeat time).
dfs.federation.router.cache.ttl
This parameter indicates the interval for updating the cache on the router. The default value is 1 minute.
3.2. Main Functions of StateStoreService
1) Initialize StateStoreDriver and maintain its connection with the State Store. Currently, multiple StateStoreDrivers are provided.
l org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreFileImpl
l org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreFileSystemImpl
l org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl
2) Dynamically register Record Stores. Currently, the following Record Stores are supported:
l MembershipStore : state of the Namenodes in the federation.
l MountTableStore : Mount table between to subclusters.
l RouterStore : Router state in the federation.
l isabledNameserviceStore : Disabled name services.
3.3. StateStoreService Initialization
org.apache.hadoop.hdfs.server.federation.store.StateStoreService#serviceInit
1) Create a StateStoreDriver instance based on the class defined in dfs.federation.router.store.driver.class.
2) Add Record Stores, MembershipStore, MountTableStore, RouterStore, and DisabledNameserviceStore.
3) Create a StateStoreConnectionMonitorService, periodically connect to the State Store, and check whether the State Store is available.
dfs.federation.router.store.connection.test
This parameter indicates the interval for the router to connect to the State Store. The default value is 1 minute.
4) Configure the expiration time of MemebershipState and RouterState.
dfs.federation.router.store.membership.expiration
If the status of a record in membershipStore is not updated within this interval, the current NameNode expires and is unavailable. The default interval is 5 minutes (five times of the heartbeat time).
dfs.federation.router.store.router.expiration
If the status of a router in the RouterStore is not updated within this interval, the router is marked as in security mode and unavailable. The default interval is 5 minutes (five times of the heartbeat time).
5) Create a StateStoreCacheUpdateService. The service periodically updates the cache on the router, including data records in the four Record Stores, to prevent each request from connecting to the State Store for query and to Reduce the pressure of State Store on ZooKeeper.
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#loadCache
dfs.federation.router.cache.ttl
This parameter indicates the interval for updating the cache on the router. The default value is 1 minute.
When the local cache of the Router is updated, the system checks whether the data in the State Store is expired. If the data expires, it is overwritten.
Cache all data in the State Store to the Router and update the number of data records cached in the Router to the metrics.

The following is an example of the code displaying when the logic for expiration check of a record is incorrect.
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#isUpdateTime
Relationship between MIN_UPDATE_MS and dfs.federation.router.cache.ttl is as follows:

6) Create StateStoreMetrics and add a JMX interface.
3.4. StateStoreService Start-up
1) Load the State Store Driver. After the State Store driver is successfully loaded, update the cache data on the router.
2) Start other services created during the router initialization.
3.5. StateStoreCacheUpdateService
3.5.1. MembershipStoreCacheUpdate
Update the cached MembershipStore on the Router and update the values of the maps in the implementation class MembershipStoreImpl.
org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl#MembershipStoreImpl
public MembershipStoreImpl(StateStoreDriver driver) {
super(driver);
this.activeRegistrations = new HashMap<>();
this.expiredRegistrations = new HashMap<>();
this.activeNamespaces = new TreeSet<>();
}
activeRegistrations -- > Map<String, MembershipState>
Stores status of the active NameNodes in the current cluster. Key: NameNode ID. Value: the NameNode status reported lately.
expiredRegistrations -- > Map<String, MembershipState>
Stores status of the expired NameNodes. Key: NameNode ID-NameService ID-Router ID, Value: Expired records.
activeNamespaces -- > Set<FederationNamespaceInfo>
Stores information about the active NameSpaces in the current cluster and the NameService information.
org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl#loadCache
1) Invoke the parent class CachedRecordStore#loadCache to update the cache in MembershipStore on the router.
2) Clear activeRegistrations, expiredRegistrations, and activeNamespaces.
3) Traverse the cached records updated in step 1.
If the NameNode status is EXPIRED, add expiredRegistrations. Otherwise, continue the operation.
Add the NameNode ID and MembershipState to nnRegistrations. nnRegistrations stores NameNode ID and all MembershipState records that report the NameNode ID.
Obtain the NameService ID, cluster ID, and BlockPool ID of the record, create FederationNamespaceInfo, and add it to activeNamespaces.
4) Based on nnRegistrations obtained in Step 3, in all records of the same NameNode ID, select the status (Active, Standby, and Unavailable) that is displayed for more than half of the records corresponding to a NameNode ID as the status of the NameNode. Select the latest record that displays the status and add the record to activeRegistrations.
org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl#getRepresentativeQuorum
4. Active Namenode Resolver
This section describes the interface invoked by RouterRpcServer. Parses the current active NameNode based on the NameService ID and Blockpool ID.
The class that is used to implement this interface is as follows:
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver
The main functions are as follows:
4.1. Loading Cache for MembershipStore and DisabledNameserviceStore
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver#loadCache
4.2. Providing the Interface for RouterRpcServer to Query the NameNode List Based on the NameService ID or Blockpool ID
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver#getNamenodesForNameserviceId
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver#getNamenodesForBlockPoolId
The preceding two interfaces return a NameNode list, which is sorted in the following sequence: active NameNode, standby NameNode, and unavailable NameNode.
1) Check whether the cacheNS or cacheBP to be queried exist in the caches cacheNS or cacheBP. If yes, the result is returned.
2) If no, query the NameService or BP in the cache MembershipStore of the Router. If the Name Service or BP is found, add it to the cache cacheNS or cacheBP.

The query herein is to obtain the NameNode list corresponding to the NameService or BP from MembershipStoreImpl#activeRegistrations, and sort the obtained NameNode list based on the NameNode state.
3) Each time when the cache in MembershipStore is refreshed, caches cacheNS and cacheBP are cleared.
4.3. Providing the Interface for Updating the Active NameNode
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver#updateActiveNamenode
If RouterRpcServer fails to connect to a NameNode based on the returned active NameNode but successfully connects to another NameNode, update the active NameNode corresponding to the NameService ID in the cache MembershipStore so that the new active NameNode is obtained when the NameService is accessed next time. In this way, the access efficiency is improved.
5. FileSubclusterResolver
This section describes the interface for parsing global paths to specific paths.
The interface is implemented using the following class:
org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver
org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver
5.1. Obtaining a Specific Path of a Global Path
1) org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver#getDestinationForPath
Find the mount point that overlaps most with the current query directory and return the remote path list of the mount point.
2) org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver#getDestinationForPath
Sort the remote paths based on the remote path list returned by MountTableResolver#getDestinationForPath.
l If there is only one remote path exists, the original remote path list is returned.
l If there are multiple remote paths, select the first NameService based on the configured sorting policy, place the NameService as the first remote path, and sort the other NameServices in the original sequence.
org.apache.hadoop.hdfs.server.federation.resolver.PathLocation#orderedNamespaces
3) sorting policy
l HASH: Select a NameService by performing hash based on the level-1 directory of the query path.
l HASH_ALL: Select a NameService by performing hash based on the global path of the query path.
l RANDOM: random mode
l LOCAL: Obtain the NameService of the node where the client is located. If no NameService exists on the node where the client is located, null is returned.
l SPACE: Select a NameService with large space.
5.2. Obtaining All Mount Points in a Global Path
org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver#getMountPoints
If a mount point exists in the global path queried currently, all sub-mount points in the current query path are returned.
5.2.1. Obtaining the Default NameService of the Cluster
Obtain the default NameService configured in federation mode.
This NameService is configured using parameter dfs.federation.router.default.nameserviceId.
6. RouterRpcServer
RouterRpcServer processes client requests, implements the interface of the NameNode client, and forwards the requests to the active NameNode of the sub-cluster.
6.1. Related Parameters
l dfs.federation.router.handler.count
Default value: 10
l dfs.federation.router.handler.queue.size
Default value: 100
l dfs.federation.router.reader.count
Default value: 1
l dfs.federation.router.reader.queue.size => ipc.server.read.connection-queue.size
6.2. CREATE Operation
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer#create
1) Check whether the write operation is allowed.
l If rpcMonitor is not null, set the operation start time.
l Add the write operation to the thread queue of the operation type.
l Check whether the router is in security mode. If yes, increase the value of rpcMonitor.routerFailureSafemode by 1 and throw exception StandbyException. Writing operations are not allowed.
2) If creatParent and isPathAll are set to true, create a parent directory for all sub-clusters. If the parent directory is successfully created for one sub-cluster, the creation for all sub-clusters is successful.
l The judgment logic of isPathAll is the sorting mode of the mounted table, HASH_ALL, RANDOM, or SPACE.
/**
* Check if a mount table spans all locations.
* @return If the mount table spreads across all locations.
*/
public boolean isAll() {
DestinationOrder order = getDestOrder();
return order == DestinationOrder.HASH_ALL ||
order == DestinationOrder.RANDOM ||
order == DestinationOrder.SPACE;
}
3) Check whether the current file exists in the sub-cluster.
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer#getCreateLocation
l Invoke the getBlockLocations method in each sub-cluster in sequence. When a sub-cluster returns a success message or all the methods is invoked for all sub-clusters, the task stops.
l If the current file already exists in a sub-cluster, the location information is returned, and the NameNode of the sub-cluster fixes the bug.
l If the current file does not exist in any sub-cluster, the system traverses the remote path in sequence based on the actual path of the file to perform the CREATE operation. When the operation is performed successfully or the remote path is traversed for all sub-clusters, the task stops.
6.3. getListing Operation
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer#getListing
1) Obtain a list of remote paths (specific paths) of the current query path.
2) Perform the getListing operation on all sub-clusters, obtain the execution result based on the sequence of the remote paths in 1, and store the result in TreeMap.<RemoteLocation, DirectoryListing>.
3) Integrate the DirectoryListing operations returned by all sub-clusters into a map based the execution result in 2. The key is filename, and the value is HdfsFileStatus.
Note that if a file with the same name exists in the same directory on different NameServices, the newly created file will overwrite the file created before. The file names under a global path must unique.
4) Obtain information about the mounted table in the path, and add the sub-mount points in the path to the final result map.
Note that if a sub-mount point has the same name as a file or sub-directory in the global directory, the sub-directory or file status will be overwritten by that of the mount point.
7. RouterAdminServer
Processes all admin requests, that is, requests related to dfsrouteradmin operations.
7.1. Related Parameters
l dfs.federation.router.admin.handler.count
Default value: 1
7.2. Precautions
Add operations that related to the modification of the mounted table to modify the mounted table in the State Store.
Reading operations such as ls are read from MountTableStore cached on the router.
Note that the cache of the router is updated every minute. The dfs.federation.router.cache.ttl parameter specifies the update interval. When data in the mounted table is modified, the latest modification operation may not be obtained after the ls operation is performed. https://issues.apache.org/jira/browse/HDFS-13443.
If the router is in the security mode, the mounted table cannot be added or deleted.
7.3. Safemode Commands
Safemode commands are used to forcibly set the router to the security mode or non-security mode.
hdfs dfsrouteradmin -safemode enter | leave | get [-routers< router_list>]
1) If the -routers parameter is not added during the operation of safemode enter|leave|get, the first router set in dfs.federation.router.admin-address.list is configured by default.


2) If the -routers parameter is added during the operation of safemode enter|leave|get, the router in the router list is configured.

3) When the first router in the list is in security mode, the first router is connected first during modification of the mounted table. Then, the router is found in security mode; other routers in the list are available; and operations can be performed normally.

8. NamenodeHeartbeatService
This section describes the NameNode heartbeat service. The router creates an independent NamenodeHeartbeatService for each NameNode that is currently monitored, periodically collects NameNode status information, and writes the information to the State Store.
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService
8.1. Related Parameters
l dfs.federation.router.monitor.namenode
Indicates the list of NameNodes to be detected by the router. The format is as follows: nsid.nnid,nsid.nnid.
l dfs.federation.router.monitor.localnamenode.enable
Indicates whether to monitor the local NameNode. If the value is true, a NamenodeHeartbeatService is created for the NameNode on the same node. The default value is true.
l dfs.federation.router.heartbeat.interval
Indicates the interval for the router to collect NameNode status. The default value is 5s.
8.2. NameNode Heartbeat Messages
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService#getNamenodeStatusReport
1) NameSpaceInfo
The router connects to the NameNode through the service IP address (port 25005 dfs.namenode.servicerpc-address) and obtains the value of NameSpaceInfo.
/**
* Version of @see #getNamespaceInfo() that is not protected by a
* lock.
*/
NamespaceInfo unprotectedGetNamespaceInfo() {
return new NamespaceInfo(getFSImage().getStorage().getNamespaceID(),
getClusterId(), getBlockPoolId(),
getFSImage().getStorage().getCTime(), getState());
}
2) Safemode
Use the service IP address (port 25005 dfs.namenode.servicerpc-address) to check whether the Name Node is in security mode.
3) FSNamesystem
Obtain the NameNode JMX information by using the web client IP address.
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService#updateJMXParameters
/**
* Get the parameters for a Namenode from JMX and add them to the report.
* @param address Web interface of the Namenode to**.
* @param report Namenode status report to update with JMX data.
*/
private void updateJMXParameters(
String address, NamenodeStatusReport report) {
try {
// TODO part of this should be moved to its own utility
String query = "Hadoop:service=NameNode,name=FSNamesystem*";
JSONArray aux = FederationUtil.getJmx(query, address);
if (aux != null) {
for (int i = 0; i < aux.length(); i++) {
JSONObject jsonObject = aux.getJSONObject(i);
String name = jsonObject.getString("name");
if (name.equals("Hadoop:service=NameNode,name=FSNamesystemState")) {
report.setDatanodeInfo(
jsonObject.getInt("NumLiveDataNodes"),
jsonObject.getInt("NumDeadDataNodes"),
jsonObject.getInt("NumDecommissioningDataNodes"),
jsonObject.getInt("NumDecomLiveDataNodes"),
jsonObject.getInt("NumDecomDeadDataNodes"));
} else if (name.equals(
"Hadoop:service=NameNode,name=FSNamesystem")) {
report.setNamesystemInfo(
jsonObject.getLong("CapacityRemaining"),
jsonObject.getLong("CapacityTotal"),
jsonObject.getLong("FilesTotal"),
jsonObject.getLong("BlocksTotal"),
jsonObject.getLong("MissingBlocks"),
jsonObject.getLong("PendingReplicationBlocks"),
jsonObject.getLong("UnderReplicatedBlocks"),
jsonObject.getLong("PendingDeletionBlocks"),
jsonObject.getLong("ProvidedCapacityTotal"));
}
}
}
} catch (Exception e) {
LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
}
}
8.3. Heartbeat Data Format
The router writes the detected heartbeat data of each NameNode into ZooKeeper. Each NameNode has a Znode, which records the status of a NameNode reported by the current router.
The node root directory in ZooKeeper is as follows: /hdfs-federation/MembershipState
A specific NameNode is named as follows: nnid-nsid-routerid
[zk:] ls /hdfs-federation/MembershipState
[34-hacluster-8-5-150-12_25019, 34-hacluster-8-5-153-15_25019, 34-hacluster-8-5-164-10_25019, 35-hacluster-8-5-150-12_25019, 35-hacluster-8-5-153-15_25019, 35-hacluster-8-5-164-10_25019, 70-ns1-8-5-150-12_25019, 70-ns1-8-5-153-15_25019, 70-ns1-8-5-164-10_25019, 71-ns1-8-5-150-12_25019, 71-ns1-8-5-153-15_25019, 71-ns1-8-5-164-10_25019]
After collecting information about the NameNode heartbeat, the router writes the information into ZooKeeper. If the NameNode exists in the ZooKeeper, the Router updates the data in ZooKeeper. If the NameNode does not exist, the router creates the NameNode, and then writes the heartbeat information into the node.
9. RouterHeartbeatService
RouterHeartbeatService periodically updates the router status information to the State Store.
9.1. Related Parameters
dfs.federation.router.heartbeat-state.interval
Indicates the interval for reporting the router status. The default value is 5s.
9.2. Router Heartbeat Messages
The heartbeat information about the router includes: IP address, startup time, status, and compilation version of the router.
org.apache.hadoop.hdfs.server.federation.store.records.RouterState#newInstance(java.lang.String, long, org.apache.hadoop.hdfs.server.federation.router.RouterServiceState)
public static RouterState newInstance(String addr, long startTime,
RouterServiceState status) {
RouterState record = newInstance();
record.setDateStarted(startTime);
record.setAddress(addr);
record.setStatus(status);
record.setCompileInfo(FederationUtil.getCompileInfo());
record.setVersion(FederationUtil.getVersion());
return record;
}
9.3. Heartbeat Data Format
The router status information is written into ZooKeeper.
The root directory in ZooKeeper is as follows: /hdfs-federation/RouterState
Znode is named as follows: router IP address (router ID)
Data in the Znode is the value after the router status information is converted to the byte format.
[zk] ls /hdfs-federation/RouterState
[8-5-150-12_25019, 8-5-153-15_25019, 8-5-164-10_25019, 8-5-214-10_25019, 8-5-214-12_25019, 8-5-214-3_25019, 8-5-214-5_25019, 8-5-214-9_25019]
[zk] get /hdfs-federation/RouterState/8-5-150-12_25019
CNqisJ/1LBCarqLf+iwaEDgtNS0xNTAtMTI6MjUwMTkiB0VYUElSRUQqDgi0z9v29SwQtq2Bo/UsMgUzLjEuMTooMjAxOC0xMS0yMVQwNDo1NlogYnkgcm9vdCBmcm9tIEhhZG9vcC1GSUDFjZ329Sw=
[zk] stat /hdfs-federation/RouterState/8-5-150-12_25019
cZxid = 0x10000080b
ctime = Tue Nov 27 14:47:19 GMT+08:00 2018
mZxid = 0x100c6de91
mtime = Fri Dec 14 16:55:06 GMT+08:00 2018
pZxid = 0x10000080b
cversion = 0
dataVersion = 28573
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 152
numChildren = 0
[zk: 8.5.214.9:24002(CONNECTED) 9]
10. Router Quota Update Service
Periodically update the quota usage of the global path of the quota and update the data in the local cache and State Store (ZooKeeper).
The local cache of the router quota is managed by RouterQuotaManager.
10.1. Related Parameters
dfs.federation.router.quota-cache.update.interval
Indicates the quota update interval. The default interval is 1 minute.
10.2. Quota Update Procedure
1) Connect to the State Store to obtain all mounted tables.
2) Traverse the list of mount points for which quota is configured.
3) Use the data obtained in Step 1 to update the cache information in the cache quotaManager.
4) Traverse the mount points for which quota is configured.
Obtain the quota usage for all mount points.
Update the value of the cache quotaManager.
If the new value is different from the old value, add the mount point to the list to be updated.
5) Connect to the State Store, and update the mount points in the to-be-updated list.
11. JMX Monitoring
11.1. RPC-related Monitoring

11.1.1. RpcQueueTimeAvgTime
RPC queue time = Time when RPCServer starts to process a request from the client - Time when RPCServer receives the request from the client
Unit: Millisecond (ms)
org.apache.hadoop.ipc.ProtobufRpcEngine.Server.ProtoBufRpcInvoker#call

11.1.2 RpcProcessingTimeAvgTime
RPC processing time = Time when RPCServer finishes the processing of a request from the client – Time when RPCServer starts to process a request from the client
Unit: Millisecond (ms)
org.apache.hadoop.ipc.ProtobufRpcEngine.Server.ProtoBufRpcInvoker#call

11.1.3 ProcessingAvgTime
Time used by the router to process a request = Time when the request is forwarded by the router to the NameNode – Time when the request is received from the client
Unit: Nanosecond (ns)
org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCPerformanceMonitor#getProcessingTime

11.1.4 ProxyAvgTime
Router agent request time = Time when the NameNode sends the request result to the router – Time when the router starts to forward the request to the NameNode
Unit: Nanosecond (ns)
org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCPerformanceMonitor#proxyOpComplete
org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCPerformanceMonitor#getProxyTime

11.1.5 JMX Update Interval
JMX update is performed every 30 seconds, but no parameter is found.
After checking the log, it is found that the log is invoked once every 30 seconds.
