Hello everyone,
Today I will share with you how to deal with a stack device restarts unexpectedly.
Issue Description
CE5855 switches running V100R006C00SPC600 are used as access switches at a site. Two CE5855 switches are deployed in a stack. According to the device logs, the master and standby CE5855 switches in the stack restarted, causing network service interruption.
Handling Process
1. After the fault occurs, check logs of the switches. It is found that the switches restarted when the fault occurred.


2. Collect diagnostic logs and system diagnostic logs using the following commands:
<HUAWEI>system-view
[~HUAWEI] diagnose
[~HUAWEI-diagnose] save logfile diagnose-log
[~HUAWEI-diagnose] collect diagnostic information
3. After analyzing the diagnostic logs, the R&D engineers found that the master switch receives the protocol packets sent by itself. These protocol packets cause the master switch to fail the stack competition and restart.
Root Cause
According to the stack restart logs, the master CE5855 restarted to join the stack competition after receiving stack competition packets. According to background logs, the MAC address of the switch that received stack protocol packets is that of the master switch. In normal cases, stack protocol packets are only sent by the master switch to the standby switch. When an exception occurs, the master switch receives stack protocol packets sent by itself from the standby switch. After the master switch receives the stack protocol packets sent by itself, it restarts and starts stack competition. Before the restart, the master switch notifies the standby switch of the stack competition. Then, the standby switch also joins the stack competition. As a result, the entire stack system restarts.
Solution
Solution 1:
Modify the ring stack topology to a chain stack topology (by binding two member ports of each switch into one logical port). Before performing the operation, ensure that traffic on the stack links does not exceed 50% of the total bandwidth (to prevent one stack link from being fully-loaded). Shut down the stack member interfaces before the operation, so that services are not affected.
Solution 2:
Upgrade the switches to V200R002C50SPC800 and installed the latest patch. The switches need to be restarted during the upgrade and services are interrupted during the restart. The service interruption duration is the switch restart duration.
Suggestions
Due to an exception on the standby switch, the standby switch sends the protocol packets from the master switch back to it. In addition, the master switch does not perform further verification on the stack protocol packets. There is a low probability that this issue occurs. You can upgrade the stack switches to V200R002C50 and install the latest patch to address this issue.
That is all I want to share with you! Thank you!