Got it

Failed to complete the instances start operation

Created: Apr 25, 2020 08:29:53Latest reply: Apr 25, 2020 08:50:33 731 3 1 0 0
  Rewarded HiCoins: 0 (problem resolved)

Hello
[Command execution failure] The HDFS NameNode instance fails to be started, and the error message is :Failed to complete the instances start operation. Current operation entities: [NameNode#ip@hostname]; Failure entities : [].

Featured Answers

Recommended answer

olive.zhao
Admin Created Apr 25, 2020 08:50:33

Hello, dear!

Have a nice day!

Log in to the node and check whether the process is started.

ps - ef | grep namenode;

Check whether port 25000 is listened on.

Isof -i :25000

If the process is started and the port is not listening, there may be a zombie process. If there is a zombie process, kill the zombie process and restart the namenode.

If you want to know more detailed information, you are advised to contact the local TAC.
https://e.huawei.com/en/service-hotline-query
Thanks!



View more
  • x
  • convention:

All Answers
Hello, dear!
It's nice to meet you in the community.
We're working on your problem. Please be patient.
View more
  • x
  • convention:

Hello,

On the FusionInsight Manager HDFS service page, the two NameNodes are always in the Standby state and cannot provide services.

Possible Causes

  • Active NameNode selection fails because the old data that controls the active/standby status of NameNodes remains in ZooKeeper.

  • The session ID of the ZooKeeper client ZKFC is inconsistent with that of the ZooKeeper server, resulting in connection deadlock.

  • JournalNode is abnormal and an active/standby switchover occurs on the NameNodes. When editlog is being restored on the NameNodes, an editlog synchronization failure occurs.

Troubleshooting Method

  • In the ZKFC logs, if information similar to java.lang.IllegalArgumentException: Unable to determine service address for namenode '1781' exists, the old data remains in ZooKeeper.

  • In the logs of the two ZKFCs, if the "Active > Standby" and "Standby > Active > Standby" NameNode status switchovers exist, the ZooKeeper client ZKFC and the ZooKeeper server are in deadlock connection.

  • On the NameNodes, write editlog and restore editlog failures exist.

  1. In the logs of the NameNode that is supposed to be in the Active state, if the following error information exists:

    Got too many exceptions to achieve quorum size 2/3.

    a write editlog failure occurs, the NameNode restarts, and the "Active > Standby" NameNode status switchover exists.

  2. In the logs of the NameNode that is supposed to be in the Standby state, if the following error information exists:

    recoverUnfinalizedSegments failed for required journal

    a restore editlog failure occurs, the NameNode restarts, an Active status preemption failure occurs, and the NameNode is in the Standby state.

Procedure

  • The old data remains in ZooKeeper.

  1. Log in to the node where the ZooKeeper client is located as user root.

    imgDownload?uuid=57b2675f4e414d30ab7a79c NOTE:

    Install the ZooKeeper client. For details, see section Installing a Client in the Software Installation.

  2. Run the following command to export the environment variables:

    cd ZooKeeper client installation directory

    source bigdata_env

  3. In security mode, run the following command to log in to Kerberos as user hdfs (skip this step in normal mode).

    kinit hdfs

    imgDownload?uuid=57b2675f4e414d30ab7a79c NOTE:

    Contact the cluster administrator to obtain the password of user hdfs. The default password is Hdfs@123.

  4. Use the ZooKeeper client to access the CLI.

    zkCli.sh -server service IP address of any functioning ZooKeeper node:24002

  5. Run the following command to delete the old data:

    deleteall /hadoop-ha/hacluster/ActiveBreadCrumb

  6. Check whether the active/standby status of NameNodes is recovered.

The ZooKeeper client and server are in deadlock connection.

  1. Log in to FusionInsight Manager, choose Services > HDFS > NameNode, and select the NameNode in the Standby > Active > Standby state.

  2. Choose More Actions > Restart instance to restart the NameNode.

On the NameNodes, write editlog and restore editlog failures exist.

Perform the following operations on the two NameNodes to restore editlog.

  • In security mode, run the following commands:

    cd ${BIGDATA_HOME}/FusionInsight/1_6_NameNode

    cp ${BIGDATA_HOME}/FusionInsight/install/FusionInsight-Hadoop-2.7.2/hadoop/sbin/exportENV_VARS.sh ./

  • In normal mode, run the following commands:

    cd ${BIGDATA_HOME}/FusionInsight/1_6_NameNode

    cp ${BIGDATA_HOME}/FusionInsight/install/FusionInsight-Hadoop-2.7.2/hadoop/sbin/exportENV_VARS.sh ./

  1. On FusionInsight Manager, stop HDFS.

  2. Log in the new NameNode node as user root, run the su - omm command to switch to user omm.

  3. Run the following commands to export the environment variables:

    EXECUTABLE_HOME="${CONTROLLER_HOME}/kerberos_user_specific_binay/kerberos"

    LD_LIBRARY_PATH=${EXECUTABLE_HOME}/lib:$LD_LIBRARY_PATH

    PATH=${EXECUTABLE_HOME}/bin:$PATH

  4. (Optional) In security mode, run the kinit command to obtain a security certificate.

    kinit hdfs

  5. Go to the ${BIGDATA_HOME}/FusionInsight/1_6_NameNode directory, and copy the environment variable script to the directory. The number 1_6 in the directory changes as the environment changes.

  6. Run the following command for the environment variables to take effect.

    source exportENV_VARS.sh

  7. Run the following command to restore editlog.

    cd $HADOOP_HOME/bin

    ./hdfs namenode -recover

  8. Enter y as prompted.

  9. On FusionInsight Manager, start HDFS.

Detail : https://support.huawei.com/enterprise/en/doc/EDOC1100020137/6503bd77/common-hdfs-namenode-faults


Thanks

View more
  • x
  • convention:

Hello, dear!

Have a nice day!

Log in to the node and check whether the process is started.

ps - ef | grep namenode;

Check whether port 25000 is listened on.

Isof -i :25000

If the process is started and the port is not listening, there may be a zombie process. If there is a zombie process, kill the zombie process and restart the namenode.

If you want to know more detailed information, you are advised to contact the local TAC.
https://e.huawei.com/en/service-hotline-query
Thanks!



View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.