Symptom
On the cluster management page of the portal of the master server on the VCN3020 cluster network, the cluster status is invalid and all cluster members are unknown.
Troubleshooting Guideline
1. Abnormal ZooKeeper service: The ZooKeeper service is not started or the ZooKeeper cluster is abnormal.
2. Abnormal center database: The center database service is abnormal or the MPU cannot connect to the center database.
Procedure
Step 1 Check whether the ZooKeeper service is running properly. Log in to the server where the ZooKeeper is located in SSH mode (generally, the ZooKeeper is located on the member of the Cluster Server type in the cluster member list), and run the /home/ivs_scu/lib/zookeeper/bin/zkServer.sh status command to check the ZooKeeper status. If the command output is follower or leader, the ZooKeeper service is normal. If the command output is not running, the ZooKeeper service is not started.
Figure 1-1 Normal process information

Figure 1-2 Process exception information

Step 2 If the ZooKeeper service is not started or is abnormal, run the /home/ivs_scu/lib/zookeeper/bin/zkServer.sh start command to start the ZooKeeper service, and check the ZooKeeper service status again. If the status is normal, check whether the cluster is recovered.
Step 3 If the ZooKeeper service is still abnormal (that is, the cluster is not recovered), check the ZooKeeper log file in /home/ivs_scu/lib/zookeeper/log/zookeeper.out. Based on the logs, the ZooKeeper port has been occupied.
Figure 1-3 ZooKeeper error log information

Step 4 The default port number of the ZooKeeper service is 2181. Run the netstat –anp | grep 2181 command to check the process that occupies port 2181.
Figure 1-4 ZooKeeper port usage

Step 5 Run the kill -9 Process ID command to stop the process that occupies port 2181.
Step 6 Manually restart the ZooKeeper service again. The cluster status becomes normal.
Root Cause
The ZooKeeper process is suspended. You need to manually stop the process and restart the zookeeper service.