Hello,
Symptom
When the OSN 3500 performed the main control board pull test, the network element was disconnected. The ECC of the entire network fluctuates, and a large number of network elements are frequently out of control.
Cause Analysis
The root cause of the ECC storm:
The ECC network has a total of 285 network elements, which is beyond the recommended range (100). When the ECC routing changes, ECC storms are extremely prone to occur.
The HWECC protocol distance defaults to 64. When the network scale is large and there is a routing change, the invalid data information of the cycle can be discarded only after 64 hops, which increases the bandwidth burden of the network and causes the normal routing data to be unable to be transmitted. Other network elements need to re-establish the MAC connection and rerouting, and need to occupy bandwidth resources, which is a vicious circle.
Steps
1- Select "NE Manager> Function Tree> Communication> DCC Management", record the enabling status of the gateway NE optical port DCC in the "DCC rate configuration", and set all DCC enable states to "disabled".
2- Set the maximum link length from the network management network element to the end network element of the network to 5.
3- Follow the record in step 1 to open an optical port DCC.
4- After the network is stable, turn on another optical port DCC.
5- Repeat step 4. Gradually enlarge the maximum ECC distance, and make sure that the ECC set is stable each time before proceeding to the next setting until all network elements resume normal communication.
Description:
The ECC network adds network elements one by one, and the stable time of ECC routing is very fast, which should not exceed tens of seconds.
Reference Information
Reasons for the ECC storm when the main control is switched:
For network element devices configured with active and standby control boards, management information can only be transmitted on the active main control board, and the optical port and network port sending and receiving functions of the standby main control are closed; once the main and standby switchover occurs, the original main The main control board will close the optical port, and its routing table and MAC connection table will be deleted; the original backup main control board will open the optical port, and its MAC connection and routing need to be re-established by the HWECC protocol. It is similar to the process of a network element going offline and then going online, that is, it will perform ECC flooding and then converge.
ECC storm processing steps:
Method 1: Set the maximum distance of ECC
The default maximum ECC distance of the device is 64. This distance is a maximum range, which is often not necessary in a specific network, and this maximum distance will affect the search range of ECC routing.
Setting the maximum ECC distance can reduce the range of network ECC routing refresh to a certain extent, thereby reducing the possibility of ECC storms; when an ECC storm occurs, set the ECC maximum distance to 5, and gradually increase after the network is initially stable. The maximum distance makes the network gradually stable.
Method 2: Close the ECC link around the backbone node
When closing the ECC link, you must know the fiber connection of the network. First, close the loop of the access layer, and completely isolate some devices from the existing ECC network. After the ECC no longer oscillates, gradually release it.
Note:
When closing the remote optical port, do not close the route to the network management direction. Ensure that the network administrator can log in normally to open the closed ECC.





