Hey there!
This is a case about the stack alarm caused by NTP server change.
Problem Description
Two CE12804S switches set up a stack, the stack ports 1/1 and 2/1 alternate between Up and Down states.
Alarm information
The alarm information indicating that the DAD port is Down is displayed in the log.
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_clear(l):CID=0x80a22713-alarmID=0x09a92003-clearType=service_resume;The protocol status of the dual-active port change to up, or the dual-active port does not exist. (hwDadDetectPort=Stack-Port2/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_clear(l):CID=0x80a22713-alarmID=0x09a92003-clearType=service_resume;The protocol status of the dual-active port change to up, or the dual-active port does not exist. (hwDadDetectPort=Stack-Port1/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2% (l):CID=0x80a22713-alarmID=0x09a92003;The protocol status of the dual-active port is down.(hwDadDetectPort=Stack-Port1/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_active(l):CID=0x80a22713-alarmID=0x09a92003;The protocol status of the dual-active port is down.(hwDadDetectPort=Stack-Port2/1).
Handling Procedure
1. Run the display alarm history verbose command to check historical alarms. The command output shows that the hwDadPortProtocolAlarm alarm is intermittently reported and cleared at the corresponding time.
Sequence: 465
AlarmId: 0x9A92003 AlarmName: hwDadPortProtocolAlarm.
AlarmType : equipment Severity: Warning State: cleared.
StartTime : 2019-03-21 03:34+08:00.
Description: The protocol status of the dual-active port is down. (hwDadDetectPort=Stack-Port1/1).
ClearTime :2019-03-21 03:34+08:00.
ClearType: service_resume
ClearReason: The protocol status of the dual-active port change to up, or the dual-active port does not exist. (hwDadDetectPort=Stack-Port1/1).
2. Check the diagnostic log. The log shows that the LPU does not receive the DAD detection packet from the MPU due to FCMA detection. The interval is 51 seconds.
Apr 26 2020 11:34:21.665+08:00 HUAWEI-SW-S12804SX2%STACKMNG/6/KEYEVENT(D):CID=0x80a20000;A key event of the stack module occurred.(Slot=1/4, SubModule=FCMA, Event=There is too much time that LPU cannot receive detect packet from MPU. TIME: 51s.).
Apr 26 2020 11:34:21.693+08:00 HUAWEI-SW-S12804SX2%STACKMNG/6/KEYEVENT(D):CID=0x80a20000;A key event of the stack module occurred. (Slot=2/4, SubModule=FCMA, Event=There is too much time that LPU cannot receive detect packet from MPU. TIME: 51s.).
Apr 26 2020 11:34:22.966+08:00 HUAWEI-SW-S12804SX2/6/C01DADKEYPROC(D):CID=0x80a22713;The key process of dual active detection occurred. (SrcModule=DAD, DstModule=PROTOCOL, OpType=process pkt, Result=ok, Reason=Dual-active port protocol state alarm.).
Apr 26 2020 11:34:22.967+08:00 HUAWEI-SW-S12804SX2%INFO/6/SUPPRESS_DIAGLOG(D):CID=0x80600401;Last diagnostic message repeated 1 time, InfoID=162074633..
Apr 26 2020 11:34:22.970+08:00 HUAWEI-SW-S12804SX2/6/C01DADKEYPROC(D):CID=0x80a22713;The key process of dual active detection occurred. (SrcModule=DAD, DstModule=PROTOCOL, OpType=process pkt, Result=ok, Reason=Dual-active port protocol state alarm resume.).
3. The user log shows that the NTP synchronization status changed at 11:33:30, and the new NTP server was synchronized. The system time was updated to 11:34:21, 51 seconds later than the previous time.
Apr 26 2020 11:33:30+08:00 HUAWEI-SW-S12804SX2%NTP/4/NTP_PEER_SELE(l):CID=0x802603fb;The peer selected by the system is 197.104.12.34..
Apr 26 2020 11:33:30+08:00 HUAWEI-SW-S12804SX2%NTP/2/hwNtpStateChangeTrap(t):CID=0x802603fb-OID=1.3.6.1.4.1.2011.6.80.2.1;NTP synchronization state changed.(hwNtpState=synchronized, hwNtpSource=197.104.12.34, hwNtpSourceVpnName=OOBM).
Apr 26 2020 11:34:21+08:00 HUAWEI-SW-S12804SX2%NTP/4/NTP_LEAP_CHANGE(l):CID=0x802603fb;System leap changes from 3 to 0 after clock update..
Apr 26 2020 11:34:21+08:00 HUAWEI-SW-S12804SX2%NTP/4/NTP_STRATUM_CHANGE(l):CID=0x802603fb;System stratum changes from 16 to 2 after clock update..
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_active(l):CID=0x80a22713-alarmID=0x09a92003;The protocol status of the dual-active port is down.(hwDadDetectPort=Stack-Port2/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_active(l):CID=0x80a22713-alarmID=0x09a92003;The protocol status of the dual-active port is down.(hwDadDetectPort=Stack-Port1/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_clear(l):CID=0x80a22713-alarmID=0x09a92003-clearType=service_resume;The protocol status of the dual-active port change to up, or the dual-active port does not exist. (hwDadDetectPort=Stack-Port1/1).
Apr 26 2020 11:34:22+08:00 HUAWEI-SW-S12804SX2/4/hwDadPortProtocolAlarm_clear(l):CID=0x80a22713-alarmID=0x09a92003-clearType=service_resume;The protocol status of the dual-active port change to up, or the dual-active port does not exist. (hwDadDetectPort=Stack-Port2/1).
4. Checked user logs and found that the customer changed the NTP server configuration from 197.1.11.40 to 197.1.11.40, causing the system time to change.
Apr 26 2020 11:33:22+08:00 HUAWEI-SW-S12804SX2%CLI/5/CMDRECORD(s):CID=0x80ca2713;Recorded command information. (Task=VTY0, Ip=197.96.16.21, VpnName=OOBM, HYPERLINK "mailto:User=Aotec.shenwenjun@htbank.net" User=Aotec.shenwenjun@htbank.net, AuthenticationMethod="Radius", Command="undo ntp unicast-server 197.1.11.40 vpn-instance OOBM".).
Apr 26 2020 11:33:22+08:00 HUAWEI-SW-S12804SX2%CLI/5/CMDRECORD(s):CID=0x80ca2713;Recorded command information. (Task=VTY0, Ip=197.96.16.21, VpnName=OOBM, HYPERLINK "mailto:User=Aotec.shenwenjun@htbank.net" User=Aotec.shenwenjun@htbank.net, AuthenticationMethod="Radius", Command="ntp unicast-server 197.96.12.34 vpn-instance OOBM source-interface MEth0/0/0/0".).
Apr 26 2020 11:33:23+08:00 HUAWEI-SW-S12804SX2%CLI/5/CMDRECORD(s):CID=0x80ca2713;Recorded command information. (Task=VTY0, Ip=197.96.16.21, VpnName=OOBM, HYPERLINK "mailto:User=Aotec.shenwenjun@htbank.net" User=Aotec.shenwenjun@htbank.net, AuthenticationMethod="Radius", Command="ntp unicast-server 197.104.12.34 vpn-instance OOBM source-interface MEth0/0/0/0".).
Apr 26 2020 11:33:27+08:00 HUAWEI-SW-S12804SX2%CLI/5/CMDRECORD(s):CID=0x80ca2713;Recorded command information. (Task=VTY0, Ip=197.96.16.21, VpnName=OOBM, HYPERLINK "mailto:User=Aotec.shenwenjun@htbank.net" User=Aotec.shenwenjun@htbank.net, AuthenticationMethod="Radius", Command="commit".).
Apr 26 2020 11:33:28+08:00 HUAWEI-SW-S12804SX2%CLI/5/CMDRECORD(s):CID=0x80ca2713;Recorded command information. (Task=VTY0, Ip=197.96.16.21, VpnName=OOBM, HYPERLINK "mailto:User=Aotec.shenwenjun@htbank.net" User=Aotec.shenwenjun@htbank.net, AuthenticationMethod="Radius", Command="display this".).
Apr 26 2020 11:33:30+08:00 HUAWEI-SW-S12804SX2%NTP/4/NTP_PEER_SELE(l):CID=0x802603fb;The peer selected by the system is 197.104.12.34..
Apr 26 2020 11:33:30+08:00 HUAWEI-SW-S12804SX2%NTP/2/hwNtpStateChangeTrap(t):CID=0x802603fb-OID=1.3.6.1.4.1.2011.6.80.2.1;NTP synchronization state changed. (hwNtpState=synchronized, hwNtpSource=197.104.12.34, hwNtpSourceVpnName=OOBM).
Root Cause
The customer changed the NTP server configuration and the device synchronized time with the new NTP server. The time difference between the system and the new NTP server exceeded 10 seconds (the current time difference between the device and the NTP server was 51 seconds). As a result, the device incorrectly considered that no consecutive packet was received and reported an alarm. The alarm was cleared when the device received packets within the same second.
Impact on services: This problem is not caused by loss of detection packets. Therefore, it does not affect the DAD protocol or services.
Solution
This problem is not caused by loss of detection packets. Therefore, the problem does not affect the DAD protocol or services. Therefore, no solution is involved.
Suggestion: We need to learn how to view various types of logs and determine the fault based on the time when the fault occurs.