Hello,
Today, I would like to share with you the XXXX backbone network SiteA-SiteB direction error handling.
Problem Description
Due to the long construction time of some optical cables in the entire XXXX network, the quality of the optical cables is poor, and errors occur in the regeneration section and multiplex section of the network and the error blocks of the higher-order channel are reported.
The SiteB-SiteA OSN3500 equipment uses SL64(V64.2b) + BPA + DCU, and the dispersion compensation is normal. It is found that some lines have errors in the regeneration section, and the optical power at the receiving end is normal.
In the XXXX network, SiteA has background error blocks and error seconds in the regeneration section for SL64 in the SiteB direction

Warning message
No alarm, there are background error blocks in the regeneration section and error seconds in the regeneration section.
Process
1- Query performance/alarm events: query whether there are more advanced RLOS/RLOF alarms for SiteB and SiteA network elements.
2- Inquire about the background error block and the error second in the regeneration section of SL64 in the direction of SiteB.
3- Check the equipment operating temperature (fan operation, heat dissipation): confirm that the equipment temperature is operating normally.
4- Check the receiving and emitting power of SiteA and SiteB: confirm that the receiving power of SiteA SL64 is -10dBm, and the receiving power range of SL64 is -14dBm ~ -1dBm, and the optical power in the direction of SiteA and SiteB is normal.
5- The optical power of the equipment line is normal. Check the DCU configured on the SiteA-SiteB line to confirm that the line dispersion is normal.
6- Check the clock tracking settings of SiteA-SiteB, confirm that the clock tracking settings are correct, the synchronization source quality is normal, the clock tracking of the entire network equipment is normal, and there is no pointer adjustment and other abnormal performance.
7- Since there are RSBBE errors in the regeneration section in the direction of SiteA Station in SiteB, and there are FEBBE errors in the direction of SiteB line, and at the same time check that there are no BBE errors in the direction of SiteB, it is decided to swap the optical cables in the two directions to determine whether the error will occur. Following the transfer of the optical cable, the two computer rooms of the customer and the customer cannot be occupied at the same time, so the replacement cannot be performed. The next step is to perform a loopback test.
8- After the hardware loopback of the line-side board at SiteA Station, the local RSBBE is normal, but the peer SiteB has no FEBBE, so it is judged that the line-side SL64 board at SiteA Station is normal. After the line-side board hardware loopback at SiteB Station, the performance is also normal. Confirm that the SL64 board on the line side of SiteB Station is normal.
9- Based on the above judgment, it is roughly confirmed that it is an optical cable problem. In order to eliminate the optical cable problem from the equipment to the ODF rack, perform an optical loop on the ODF rack side to confirm that the equipment error code is normal and the optical cable from the equipment to the ODF rack is normal.
10- At this point, most of the causes of bit errors have been eliminated. Finally, according to the analysis, because most of the customers currently use old optical cables, and because of the long transmission distance, the BA+DCU configuration is used. Because the BA is at 14/17dBm The non-linear effect is caused when the light is emitted, which leads to a large amount of RSBBE. Therefore, add a 2dB/3dB attenuator to the BA-OUT port of the transmitter to reduce the output optical power of the BA and reduce the non-linear effect caused by the excessive optical power on the old optical cable. The performance of the line side of the SiteA and SiteB stations was confirmed to be error-free. So far, it was confirmed that the root cause of the error was the excessively high luminous power of BA.
Root cause
1- The performance of the optical fiber is degraded and the loss is too high; the optical fiber connector is not clean or connected incorrectly; the equipment is not well grounded; there is a strong interference source near the equipment; the equipment heat dissipation is poor, and the working temperature is too high; the transmission distance is too short and no attenuator is added. Lead to receiving optical power overload; EMC shielding is not good.
2- The signal on the receiving side of the circuit board is attenuated too much, the sending circuit of the opposite end is faulty, the receiving circuit of the local end is faulty; the clock synchronization performance is not good; the cross board and the circuit board and the branch board are not well matched; the branch board is faulty; the fan is faulty, Lead to poor heat dissipation of the device.
3- Since high-level errors will lead to low-level errors, we should deal with errors in the order of first high-level and then low-level.
4- If the local end reports a BBE performance event, it means that the local receiving side has detected a bit error, and there is a problem in the channel between the remote sender and the local receiver.
5. If the local end reports an FEBBE performance event, it means that the remote receiving side has detected a bit error, and there is a problem in the channel between the local sending and the remote receiving.
Suggestions and conclusions
1- The handling of the error code must be broken down individually. Don't be disturbed by too many channel errors. You must find the commonality of the error code service (for example: service passing through a certain station; service ending at a certain station; The business of the road board; the business that passes through a certain optical board, etc.), and make empirical judgments, and then track a 2M from it. Through our magic weapon: loopback, to gradually determine the scope of the fault (Note: loopback will interrupt the business, must be performed when the business volume is low; loopback may cause ECC failure, carefully analyze ECC to confirm that it will not affect the network management Perform loopback operation after management); after determining it is a certain station, use the single board replacement method to locate and solve the fault. Note: Be sure to bring all the spare boards that may be related to the fault to avoid going wrong and wasting time.
2- The line failure is not necessarily a fiber quality problem. A poor flange may cause the opposite end to receive too low power and cause line errors.
3- For line error problems caused by poor fiber quality or flange failure. You should first check the alarm/performance information, not only the alarm/performance event, but also the performance parameters such as the receiving and emitting power and the device temperature. If the optical power is too low or too high, or the device temperature is too high, it will cause bit errors.
4- In the actual maintenance process, if the optical boards of the two stations adjacent to the site report a large number of errors, or the east-west optical boards of the site report a large number of errors, the possibility of crossover or clock unit failure of the site is relatively high.
5- Because 10G long-distance transmission cooperates with BPA, and because of the transmission on the old optical cable, when the light emission of BA is 14/17dBm, a large number of errors will be generated due to the nonlinear effect. At this time, it is necessary to reduce the transmission optical power of BA on the optical cable.
You are welcome to leave a message and exchange in the comment area. Thank you!



