Hello, everyone!
Today will share with you a case about SNCP do not work.
Problem Description
In the "C" network, One NE was disconnected and the packet services of the NE were down. There was two SCC board in the NE and no power issue was there, so the customer was unable to find out the cause of this failure and asked the Huawei team to find out the RCA. Software Version of OptiX OSN 7500: V200R013C30SPC100 Service Interrupted?:
Handling procedure
From slot 24 black box data, we can see that on 2019-08-26, slot 24-SCC board tried to send a message to slot 25-SCC but reported a timeout. So from 2019-08-26, we can say the 24 was working as master board and slot 25 already have some problem;
On 2019-12-28 07:30 local time: slot 24 detects itself have a problem. So, slot 24-SCC requested for switching to slot 25-SCC but failed.
At 2019-12:28 17:37: hard reset was performed on both the GSCC board, but After hard reset slot 25 can't boot up, so slot 24 changed as a master board again, NE was connected in NMS and all related packet services of the board restored
After slot 24 become master, slot 25 and slot 11 were showing offline. We have inserted the slot 11-PEX1 board in slot 12 and can see the board can be online normally. So it was confirmed that there was no abnormality in slot 11-PEX1.
After Jack out of slot 25, slot 24 reported HRAD_BAD alarm, due to the internal communication bus had a problem;
After that, we have replaced both slot 24-SCC and slot 25-GSCC one by one with a new spare board and the BD_STATUS alarm on the 11-PEX1 board has been cleared and the board came online normally.
Root cause
Both of GSCC board fault
Solution
Replace both GSCC boards.
That's all! Hope you find my post useful. Please let me know in the comment section if you have any further concerns or remarks.
Thank you!