Symptom
The NE software version of the OptiX OSN 8800 installed at a station is 5.51.05.23. One day, the SCC board in slot 28 repeatedly reports transient COMMUN_FAIL alarms. In addition, multiple boards report the TEMP_OVER alarm.
Cause analysis
Possible causes of the COMMUN_FAIL alarm are as follows:
- Boards on the NE are being reset (cold or warm).
- The network cables that cascade subracks do not meet relevant requirements.
- Boards are faulty.
Before this fault occurs, the OptiX OSN 8800 at the station runs normally and no operation is performed on it.
Since multiple boards report the TEMP_OVER alarm, it is suspected that the ambient temperature is high, which causes the boards to continuously work in high temperature environment. On the NMS, check historical alarms and performance events of the OptiX OSN 8800 and the time when they are reported. When the SCC board repeatedly reports transient COMMUN_FAIL alarms, the SCC board reports that the equipment temperature ranges from 70°C to 75°C. After a consult with on-site maintenance engineers, Huawei engineers confirm that the air conditioner in the telecommunications room is faulty.
A data analysis proves that when the ambient temperature is high, the CPU resources and memory time sequence may be insufficient, and this causes abnormal resets on boards.