Symptom
In an office, five SSE5LWFS boards report the POWER_FAIL alarms almost at the same time, and the alarm parameters are 0x01 0x01 0x01 0xff 0xff.
Cause analysis
The possible causes of the fault are as follows:
- The alarm parameters indicate that the 3.3 V primary power supply fails, and the secondary power supply is in use. Therefore, board running is not affected.
- Running stability of the power modules is closely relevant to ambient temperature. Therefore, check the current and history working temperature and ambient temperature of the boards.
Procedure
- The history alarms contain a large number of TMP_OVER alarms, indicating that the boards are running in high temperature for a long time.
- The 24-hour performance statistics indicate that the board temperature is usually around 55°C. When the board temperature reaches 55°C, the temperature of the power supply modules and optical modules is higher than 55°C.
- According to the preceding analysis, it is determined that the SSE5LWFS boards report the POWER_FAIL alarms due to excessively high temperature. If the fault is not rectified promptly, the boards fail after the power supply modules fail.
Reference Information
Conclusions and suggestions for this case are as follows:
- Decrease the ambient temperature. When the boards report the transient TMP_OVER alarms, it indicates that the ambient temperature exceeds the normal temperature range of WDM equipment.
- Increase the fan rotating speed. When the TMP_OVER alarms are reported, check the running state of fans in the subrack. If the rotating speed of the fans is low, increase the rotating speed.
- The running environment of WDM equipment must satisfy relevant requirements. Environmental changes (especially changes in ambient temperature) may cause exceptions during system running. Therefore, associated personnel must periodically check the running environment of WDM equipment.
- Clear abnormal alarms promptly.