Hello everyone,
I believe you can learn how to deal with Active/Standby Switchover Fails to Be Disabled When the Standby OMS Is Abnormal from my case.
[Applicable Versions]
6.5.x
[Symptom]
When the standby OMS is faulty or a process is abnormal, the command for disabling the active/standby switchover fails to be executed when the management node is replaced. If an active/standby switchover occurs after the OMS is installed on the new node, files and data will be synchronized, which makes the subsequent restoration difficult.
This section describes how to disable the active/standby switchover before installing the standby OMS for replacing the faulty management node.
[Impact and Severity]
The ha.bin process on the active OMS node needs to be restarted.
[Expected Restoration Time]
10 minutes
[Prerequisites]
This operation has a great impact. Before performing this operation, you must confirm with the R&D engineers in advance. You can perform this operation only after the R&D engineers approve it.
The following items must be prepared before the restoration.
Table 1 Items to be prepared before the restoration
No. | Item | Operation |
|---|---|---|
1 | Account and password of the omm or root user on the active OMS node | Apply for the password of the root or omm user for accessing the active OMS node. |
2 | SSH remote login tool | Prepare tools such as PuTTY and SecureCRT. |
[Fault Locating]
After the active/standby switchover is disabled, a flag file forbid.txt is generated in ${OMS_RUN_PATH} /workspace0/ha/local/haarb/conf. This file records the time when the active/standby switchover is disabled and the disabling duration.
Manually generate the forbid.txt file and restart the HA process to load it.
[Procedure]
The following operations take the C70SPC200 version as an example. Pay attention to the paths that may be different for other versions.
Log in to the active OMS node as the omm user.
omm@hadoop02:~> touch ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt
Obtain the current timestamp at this moment.
For example:
omm@hadoop02:~> date +%s1532588636
Modify the tag file generated in Step 1. The first timestamp in the following information is generated in Step 2.
omm@hadoop02:~> echo 1532588636 > ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt omm@hadoop02:~> echo 86400 >> ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt
Restart the HA process and enable the HA process to load the file.
For example:
omm@hadoop02:~> ps -ef | grep ha.bin | grep OMSV100R001C00x8664 omm5121810 Jul23 ?00:26:46 /opt/huawei/Bigdata/om-server_V100R002C70SPC200/OMSV100R001C00x8664/workspace0/ha/module/hacom/bin/ha.bin --logsyslog=0 --loglevel=INFO --logpath=/var/log/Bigdata/omm/oms/ha/runlog --logarchive=21600 --bboxpath=/var/log/Bigdata/omm/oms/../core/ha/core --module=HA
Kill the preceding process and restart the system.
omm@hadoop02:~> kill -9 51218
[Result Verification]
Log in to the active OMS node as the omm user and check whether the HA process is started.
For example:
hadoop02:~ # ps -ef | grep ha.bin | grep OMSV100R001C00x8664 omm807412 15:18 ?00:00:01 /opt/huawei/Bigdata/om-server_V100R002C70SPC200/OMSV100R001C00x8664/workspace0/ha/module/hacom/bin/ha.bin --logsyslog=0 --loglevel=INFO --logpath=/var/log/Bigdata/omm/oms/ha/runlog --logarchive=21600 --bboxpath=/var/log/Bigdata/omm/oms/../core/ha/core --module=HA
Run the command for disabling the active/standby switchover.
If the execution fails and the return code is 12, the active/standby switchover is successfully disabled.
For example:
hadoop02:~ # ${OMS_RUN_PATH}/workspace0/ha/module/hacom/tools/ha_client_tool --forbidswitch --name=product --time=1440 ERR: execute command forbidswitch failed[12]. Exec Option failed.
Hope you can learn from it, thank you!