1.1.1 [Warning] 3500 indicators lead to the problem that the warning can not be reported and eliminated
Problem Description (C30 tr5)
The alarm information displayed on UI can not be eliminated and the new alarm can not be reported normally.
The root cause of the problem
A large number of 3500 indicators (intermediate indicators for calculating class indicators) generate alarms, which occupy the resources of message queue and lead to the elimination of alarm messages that can not normally reach FMS for alarm elimination.
Confirmation steps
1. Check the UI interface for unavoidable alarms or fail to report alarms normally.
2. View the main node / var / log / Bigdata / OMM / OMS / PMS / pms. log file, there are 3500 alarm reports;
3. Use the ipcs-q command to query the message queue information, and check that the corresponding message value of key 0x00001e61 is over 1800 and the empirical value is 1889.
4. If the above scenario occurs, it can be ascertained that the known 3500 indicators cause the alarm to be unable to report and eliminate the problem.
Solutions
Scheme 1: Use C30SPC503 patch pack, how to verify please refer to the "Scheme Verification" instructions;
Solution 2: Replace the controller package and regenerate DB data to ensure that no alarm information is sent by 3500 index. The specific operation is described with reference to the following "modification steps".
Modification steps
1. Backup/opt/huawei/Bigdata/om-0.0.1/share/om/controller/controller-0.0.1.jar file
2. Replace the controller-0.0.1.jar file of the master and standby node of manager, and ensure that the user rights of the replaced jar package are omm: wheel and the execution rights are 700;
The path is: / opt/huawei/Bigdata/om-0.0.1/share/om/controller/controller-0.0.1.jar;
3. Modify the / opt/huawei/Bigdata/om-0.0.1/sbin/controller.sh file and change - Dstack.conf.dir="" to - Dstack.conf.dir="/opt/huawei/Bigdata/om-0.0.1/etc/components/FusionInsight_V100R002C30SPC100"
Note: Version number is based on the actual version number of the current version. Version number query mode: execute commands under OMM users"/ opt/huawei/Bigdata/om-0.0.1/sbin/queryVersion.sh"
4. Execute the command "su-omm" to switch to the OMM user and restart controller by executing the following command:
Sh/opt/huawei/Bigdata/om-0.0.1/sbin/restart-controller.sh
5. Modify the file / opt/huawei/Bigdata/om-0.0.1/sbin/controller.sh to restore - Dstack.conf.dir="/opt/huawei/Bigdata/om-0.0.1/etc/components/FusionInsight_V100R002C30SPC100" to - Dstack.conf.dir=""
6. Query the message queue information using the ipcs-q command and execute ipcrm-qx to empty the message queue. The value of X is the msqid value corresponding to 0x00001e61.
Scheme Verification
Before amendment:
1. Log on to the database of the main node and switch to OMM user, gsql-p 20015-W OMM Huawei@123-U omm;
2. Set the query path set search_path=omm_1;
3. Make a select * from TBL_PM_ALARMTHRESHOLD query.
4. View the value of ALARMTHRCRITICAL for 3500 corresponding ALARMTHRMAJOR rows as YES;
5. View the main node / var / log / Bigdata / OMM / OMS / PMS / pms. log file, there are 3500 alarm reports;
After amendment:
1. Query the TBL_PM_ALARMTHRESHOLD table.
2. The value of ALARMTHRCRITICAL for 3500 corresponding ALARMTHRMAJOR rows is NO.
3. View the main node / var / log / Bigdata / OMM / OMS / PMS / pms. log file, there is no 3500 alarm report;
4. Use the ipcs-q command every five minutes to see if the message queue value is below 1800.
If the above expected results are satisfied, the validation is successful.
