Hello, everybody!
During the maintenance of the transmission network, bit error alarms are often generated by the equipment. It's going to disrupt our business, so we have to eliminate it. This post tells you about the standard solution for handling bit errors in MSTP networks. It can be used as a reference for dealing with internal error problems.
Handling Process
1. Checking for Alarms or Performance Events Related to Higher-Order Bit Errors
Check whether the B1_SD, B2_SD, B1_EXC, B2_EXC, B3_SD, or B3_EXC alarm is reported. If yes, go to Checking Whether the External Environment Is Abnormal.
Check whether the performance events contain the regenerator section, multiplex section, and higher-order path bit errors. If yes, go to Checking Whether the External Environment Is Abnormal.
If none of the preceding bit errors occurs and only the BIP_SD and BIP_EXC bit error alarms are reported, go to Checking Whether the Peer Board Is Faulty.
If none of the preceding bit errors occurs and only the lower-order path bit errors occur in the performance events, go to Checking Whether the Peer Board Is Faulty.
2. Checking Whether the External Environment Is Abnormal
Query the NE alarms and check whether the NE that reports the bit error alarm reports the temperature alarm and fan fault alarm, such as the TEMP_OVER, FAN_FAIL, and FAN_FAULT alarms. If the NE reports the TEMP_OVER, FAN_FAIL, or FAN_FAULT alarm, take the following steps:
Check the ambient temperature of the equipment room, clean the air filter, and heat dissipation of the fans. Handle the abnormalities.
After the environmental abnormalities are handled and the TEMP_OVER and FAN_FAIL alarms are cleared, check whether the bit errors are also cleared.
If the bit errors persist, go to Checking Whether the Peer Board Is Faulty.
3. Checking Whether the Peer Board Is Faulty
Check whether the peer board reports an optical module abnormality alarm, such as TF, LSR_WILL_DIE, or OUT_PWR_ABN. If any of the preceding alarms is reported, replace the optical module. If the optical module cannot be removed or re-inserted, replace the peer board.
Query the transmit optical power of the optical module on the peer board. If the transmit optical power of the optical module is beyond the proper range, replace the optical module on the peer board. If the optical module cannot be removed or re-inserted, replace the peer board.
Perform a self-loop on the optical module of the peer board using a pigtail. If an alarm is reported, replace the optical module. If the optical module cannot be removed or re-inserted, replace the peer board.
4. Checking Whether the Local Board Is Faulty
Check whether the local board reports an optical module abnormality alarm, such as TF, LSR_WILL_DIE, or OUT_PWR_ABN. If any of the preceding alarms is reported, replace the optical module. If the optical module cannot be removed or re-inserted, replace the local board.
Test the receive optical power of the local board. If the input optical power of the local board is beyond the normal range, use the optical power meter to test the input optical power of the optical module. If the optical power is within the normal range, remove and re-insert the pigtail to ensure that the pigtail is in good contact with the board. If the fault persists, replace the optical module. If the optical module cannot be removed or re-inserted, replace the local board. If the tested input optical power is beyond the normal range, check the line fault according to Checking Whether Line Performance Deteriorates.
Perform a self-loop on the local board using a pigtail. If an alarm is reported, replace the optical module. If the optical module cannot be removed or re-inserted, replace the local board.
5. Checking Whether Line Performance Deteriorates
If bidirectional bit errors occur, replace the cables with spare cables. If the alarms are cleared after the replacement, replace or repair the original cables.
If unidirectional bit errors occur, exchange the optical cables in the receive and transmit directions.
If the alarm direction changes with the cable, replace or repair the cable.
If the alarm direction does not change with the cable, the line is not faulty or the line performance does not deteriorate.
6. Checking Whether Clock Configuration Is Incorrect or Performance of the Cross-connect and Clock Unit Deteriorates
The clock sources of the local NE and peer NE are asynchronous or interlocked, causing bit errors or even service interruption. If the NEs also report performance events or alarms related to pointer justification, such as AUPJCHIGH, AUPJCLOW, AUPJCNEW, and SYN_BAD, rectify the fault caused by clock configuration first with reference to related performance events or alarms.
If the clock alarm persists, perform an active/standby switchover on the local NE. If the fault is rectified, replace the original active cross-connect and timing board at the local end.
If the alarm persists, perform an active/standby switchover on the peer NE. If the fault is rectified, replace the original active cross-connect and timing board at the peer end.
If the alarm persists, contact Huawei engineers for help.
7. Checking Whether the Local Board Is Faulty
Configure the cross-connect loopback on the local board. If the alarm is cleared, the fault is not caused by the local board.
If the alarm persists, the local board and cross-connect board may be faulty.
Query the service configuration and alarms of the NE. If bit error alarms are reported only on the local board, preferentially replace the local board. If bit error alarms are reported on multiple boards, preferentially replace the cross-connect board.
8. Checking Whether the Peer Board Is Faulty
Configure the cross-connect loopback on the peer board. If the alarm is cleared, the fault is not caused by the peer board.
If the alarm persists, the peer board and the cross-connect board may be faulty.
Query the service configuration and alarms of the NE. If bit error alarms are reported only on the peer board, preferentially replace the peer board. If bit error alarms are reported on multiple boards, preferentially replace the cross-connect board.
9. Checking Whether a Board on the Transmission Path Is Faulty
Perform a loopback to locate the NE that first reports the alarm according to the signal flow. See the following figure.

b. Perform a loopback test point by point. If the tributary board or BER tester reports a bit error alarm after the loopback, locate the faulty NE and then check the faulty NE.
NOTICE:
A loopback causes service interruption. If another service on the upstream NE uses the same VC-4 channel as the channel that reports the alarm on the local NE, do not perform a loopback on the upstream NE.
Extended Information
The causes of bit errors in the transmission network
How to handle the Bit Error in WDM Network
Thank you!
