Applicable versions
V100R002C30SPC60*
V100R002C50SPC20*
V100R002C60SPC20*
V100R002C60U10, V100R002C60U10SPC00*
V100R002C60U20, V100R002C60U20SPC00*
V100R002C70SPC20*
Diagnostic
NameNode start fails because of metadata loss.
Fault locating
1. Analyzed the NameNode run log (/var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log):
2018-08-22 05:23,550 ERROR [main] Exception in namenode join org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1259) org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /srv/BigData/hadoop/data2/namenode/data is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:310) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:568) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:411) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:393) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeNamesystem(NameNode.java:441) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:423) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:615) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:596) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1196) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1255)
2. The value of the dfs.name.dir parameter for two NameNodes is /srv/BigData/hadoop/data2/namenode/data.
A data file exists under the directory of a functional node:
IHADOOP-9:/srv/BigData/hadoop/data2/namenode/data # ll drwxr-xr-x 2 omm wheel 36864 Nov 17 21:34 current -rwxr-xr-x 1 omm wheel 0 Nov 18 11:26 in_use.lock
No file exists under the directory of a malfunctioning node.
IHADOOP-21:/srv/BigData/hadoop/data2/namenode/data #ll
3. It is found that the startup fails because the data of the malfunctioning node is lost.
Solution
1. Go to the /srv/BigData/hadoop/data2/namenode/data/ directory of the functional NameNode (IHADOOP-9 in the example) and copy the current directory and in_use.lck to the same directory of the malfunctioning NameNode (IHADOOP-21 in the example).
2. Recursively change the permission for the newly copied files to omm:wheel on IHADOOP-21.
3. Restart the HDFS service.