CE12808(V100R005C10SPC200) the NMS detected that servers and a CE switch could not be managed, service automatically recovered after 15 minutes

Latest reply: Sep 14, 2018 08:50:55 289 1 0 0
CE12808V100R005C10SPC200 the NMS detected that servers and a CE switch could not be managed, service automatically recovered after 15 minutes.


It was reported that the CAD service was affected, the NMS detected that servers and a CE switch could not be managed, and the CAD service automatically recovered after 15 minutes.

Version Information 


Networking overview

The following figure shows the traffic service model. The S12700 functioned as the gateway, and the CE12800 and CE6850 functioned as Layer 2 aggregation devices. The S12700 searched the ARP table to forward traffic, and the CE12800 and CE6850 searched the MAC address table to forward traffic.


Topology diagram of network

CE12808(V100R005C10SPC200) the NMS detected that servers and a CE switch could not be managed, service automatically recovered after 15 minutes-2745605-1

Fault phenomenon

It was reported that the CAD service was affected, the NMS detected that servers and a CE switch could not be managed, and the CAD service automatically recovered after 15 minutes.


1. The following figure shows the traffic service model. The S12700 functioned as the gateway, and the CE12800 and CE6850 functioned as Layer 2 aggregation devices. The S12700 searched the ARP table to forward traffic, and the CE12800 and CE6850 searched the MAC address table to forward traffic.

CE12808(V100R005C10SPC200) the NMS detected that servers and a CE switch could not be managed, service automatically recovered after 15 minutes-2745605-2


2. When the service was abnormal, the CAD service was affected. In addition, the NMS device could not access the upstream CE6850 of CAD. Both NMS access service traffic and CAD service traffic passed through the S12700, CE12804, and CE6850. We suspect that an exception occurred on the traffic forwarding path.


3. We check logs on the S12700 and only find ARP packet CPCAR and auto-port-defend alarms. No other exception information is found.


4. The switch checks whether CPCAR packet loss occurs every 10 minutes and records a log if packet loss occurs. On the day when the problem occurred, the switch recorded a CPCAR packet loss alarm at 20:04, indicating that ARP request packets were discarded between 19:54 and 20:04 and CPCAR packet loss did not occur at other times. The server could not be managed from 19:49 and the problem occurred before the CPCAR packet loss was recorded.


5. Monitoring logs on the NMS:


Log on the S12700:

There is only one cpcar alarm about arp:

Oct  7 2017 20:04:46+03:00 DC-TI5-TI6-S12708 %DEFD/6/CPCAR_DROP_LPU(l)[2856922]:Rate of packets to cpu exceeded the CPCAR limit on the LPU in slot 2/2. (Protocol=arp-request, CIR/CBS=128/24064, ExceededPacketCount=2)

 

6. ARP entries corresponding to the abnormal servers and abnormal CE6850 were not deleted or updated on the S12700.

There is no auto-port-defend alarm on Eth-Trunk2 and Eth-Trunk3:

Oct  7 2017 19:48:14+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856897]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/28, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:48:25+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856898]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/12, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:49:25+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856899]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/31, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:49:25+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856900]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet1/2/0/12, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:49:25+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856901]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/22, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:57:25+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856912]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/28, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:59:06+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856914]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet2/2/0/24, AttackProtocol=ARP-REQUEST)

Oct  7 2017 19:59:56+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856915]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet1/2/0/12, AttackProtocol=ARP-REQUEST)

Oct  7 2017 20:01:14+03:00 DC-TI5-TI6-S12708 %SECE/4/PORT_ATTACK_OCCUR(l)[2856916]:Auto port-defend started.(SourceAttackInterface=XGigabitEthernet1/2/0/34, AttackProtocol=ARP-REQUEST)

There are no arp changes on both S12700 and CE6850 at the issue time:

D  10.0.13.108     2c27-d7c8-0873 113  XGE1/2/0/24               10-07 17:24:51+03:00

D  10.0.13.108     2c27-d7c8-0873 113  XGE1/2/0/24               10-07 19:20:32+03:00

D  20.32.100.143   a0d3-c134-89d6 2    Eth-Trunk8                10-07 20:12:39+03:00

D  10.0.13.159     40e2-30d1-94cb 113  XGE1/2/0/24               10-07 21:20:03+03:00

D  10.0.13.213     34e6-ad1e-a258 113  XGE1/2/0/24               10-07 21:43:36+03:00

 

D  10.1.3.100      3464-a935-1399 1100 Eth-Trunk1                02-07 07:38:25

D  10.1.3.111      845b-124e-78cb 1100 Eth-Trunk1                03-30 01:15:46

D  10.1.3.69       d0d0-4baa-7e34 1100 Eth-Trunk1                04-22 20:06:06

M  192.168.11.107  000c-29b8-64d1 11   Eth-Trunk11  Eth-Trunk51  05-10 04:14:15

 

7.  We analyze alarms and logs on the CE12800 and CE6850, and do not find any exception records.

8.  We analyze memory, CPU, CPCAR, and other key information on the CE12800 and CE6850, and do not find any exception records.

9.  The CE12800 switches are connected to the S12700 and CE6850 through M-LAG. We check M-LAG, VLAN, MAC address, and other key forwarding information on the CE12800 switches, and do not find any exception records. The CE12800 switches only performed Layer 2 traffic forwarding based on MAC addresses.


M-LAG log records are as follows and no abnormal update is found.

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log status slot 3

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log status slot 4

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log error slot 3

[04-05 01:35:10.094][DFS GateWay]Join vs, add heart acl, get udp port num failed.vsid:0,Ret:0.

[04-05 01:35:10.094][DFS GateWay]Join vs, get udp port num failed.vsid:0,Ret:0.

[04-05 01:35:09.549][DFS GateWay]Join vs, add heart acl, get udp port num failed.vsid:0,Ret:0.

[04-05 01:35:09.549][DFS GateWay]Join vs, get udp port num failed.vsid:0,Ret:0.

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log error slot 4

[04-05 01:34:07.895][DFS GateWay]Join vs, add heart acl, get udp port num failed.vsid:0,Ret:0.

[04-05 01:34:07.895][DFS GateWay]Join vs, get udp port num failed.vsid:0,Ret:0.

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log peer-link slot 3

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log peer-link slot 4

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log member slot 3

[~DC-TI4-CE12804-1-diagnose]display fei m-lag log member slot 4

[~DC-TI4-CE12804-1-diagnose]display fei vlan slot 3 local collect

 

We check VLAN logs on the CE6850 and CE12800, and do not find any exception updates.

[10-07 01:58:14.265]Add PNI data to TBLM. (TB:24, TP:2, Ret:0x0)

[10-07 01:58:14.265]Add PNI data to TBLM. (TB:24, TP:1, Ret:0x0)

[10-07 01:57:21.259]Add MAINIF data to TBLM. (TB:24, TP:12, Ret:0x0)

[10-07 01:57:21.259]Add MAINIF data to TBLM. (TB:24, TP:11, Ret:0x0)

[10-07 01:57:21.259]Add MAINIF data to TBLM. (TB:16, TP:13, Ret:0x0)

[10-07 01:57:21.258]Add MAINIF data to TBLM. (TB:24, TP:6, Ret:0x0)

[10-07 01:56:39.262]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:1100, Ret:0x0

)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:500, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:250, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:113, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:110, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:100, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:54, Ret:0x0)

[10-07 01:56:39.261]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:53, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:52, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:51, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:31, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:11, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:10, Ret:0x0)

[10-07 01:56:39.260]Add VLANCFG data to TBLM. (VS:0, PdtVS:0, VLAN:1, Ret:0x0)

[10-07 01:56:36.263]Set PVID data to TBLM. (VS:0, TB:17, TP:21, Ret:0x0)

[10-07 01:56:36.263]Set PVID data to TBLM. (VS:0, TB:17, TP:17, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:17, TP:13, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:17, TP:9, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:17, TP:5, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:17, TP:1, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:16, TP:21, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:16, TP:17, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:16, TP:9, Ret:0x0)

[10-07 01:56:36.262]Set PVID data to TBLM. (VS:0, TB:16, TP:5, Ret:0x0)

[10-07 01:56:36.261]Set PVID data to TBLM. (VS:0, TB:16, TP:1, Ret:0x0)

[10-07 01:56:36.259]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:21, Ret:0x0)

[10-07 01:56:36.259]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:21, Ret

:0x0)

[10-07 01:56:36.259]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:17, Ret:0x0)

[10-07 01:56:36.258]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:17, Ret

:0x0)

[10-07 01:56:36.258]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:13, Ret:0x0)

[10-07 01:56:36.258]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:13, Ret

:0x0)

[10-07 01:56:36.258]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:9, Ret:0x0)

[10-07 01:56:36.258]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:9, Ret:

0x0)

[10-07 01:56:36.257]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:5, Ret:0x0)

[10-07 01:56:36.257]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:5, Ret:

0x0)

[10-07 01:56:36.257]Add PORT-VLAN bitmap data to TBLM. (TB:17, TP:1, Ret:0x0)

[10-07 01:56:36.257]Add source PORT-VLAN bitmap data to TBLM. (TB:17, TP:1, Ret:

0x0)

[10-07 01:56:36.257]Add PORT-VLAN bitmap data to TBLM. (TB:16, TP:21, Ret:0x0)

[10-07 01:56:36.256]Add source PORT-VLAN bitmap data to TBLM. (TB:16, TP:21, Ret

:0x0)

[10-07 01:56:36.256]Add PORT-VLAN bitmap data to TBLM. (TB:16, TP:17, Ret:0x0)

[10-07 01:56:36.256]Add source PORT-VLAN bitmap data to TBLM. (TB:16, TP:17, Ret

:0x0)

[10-07 01:56:36.256]Add PORT-VLAN bitmap data to TBLM. (TB:16, TP:9, Ret:0x0)

[10-07 01:56:36.256]Add source PORT-VLAN bitmap data to TBLM. (TB:16, TP:9, Ret:

0x0)

[10-07 01:56:36.255]Add PORT-VLAN bitmap data to TBLM. (TB:16, TP:5, Ret:0x0)

[10-07 01:56:36.255]Add source PORT-VLAN bitmap data to TBLM. (TB:16, TP:5, Ret:

0x0)

[10-07 01:56:36.255]Add PORT-VLAN bitmap data to TBLM. (TB:16, TP:1, Ret:0x0)

[10-07 01:56:36.255]Add source PORT-VLAN bitmap data to TBLM. (TB:16, TP:1, Ret:

0x0)

[10-07 01:56:29.242]Locate INLIF. (PdtVS:0, LifType:2, InLifValue:32768, Ret:0x0

 

Based on the preceding analysis, forwarding entries on switches along the traffic forwarding path did not have any exceptions when the service fault occurred, and the service automatically recovered when no service change was made. We suspect that this fault belongs to faults (such as chip failures) that can be automatically fixed.

According to the problem symptom and recovery scenario (automatic recovery), we confirm with the chip vendor and find that the fault occurred on the CE switch because a chip soft error was not fixed promptly. The problem has been solved by a patch. It is recommended that the latest patch be loaded on switches in the live network. For details, see the patch release notes:


Root cause:

Soft errors are common faults that occur on chips from many chip vendors, and also occur on chips in Cisco devices. Chip vendors provide many automatic detection and recovery methods to fix soft errors. The fault on the live network was not fixed immediately because the recovery method provided by the chip vendor is defective. The problem has been solved by a patch. 


Solution:

It is recommended that the latest patch V100R005SPH013 be loaded on both the CE6850 and CE12800 because the problem may occur on them.

Patch for the CE12800:

http://support.huawei.com/enterprise/en/switch/cloudengine-12800-pid-7542409/software/22621480/?idAbsPath=fixnode01|7919710|21782165|21782236|22318638|7542409

Patch for the CE6850:

http://support.huawei.com/enterprise/en/switch/cloudengine-6800-pid-7597815/software/22621482/?idAbsPath=fixnode01|7919710|21782165|21782239|22318540|7597815


This article contains more resources

You need to log in to download or view. No account?Register

x
  • x
  • convention:

HC_David
Created Sep 14, 2018 08:50:55 Helpful(0) Helpful(0)

thank your share :)
  • x
  • convention:

Reply

Reply
You need to log in to reply to the post Login | Register

Notice Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " Privacy."
If the attachment button is not available, update the Adobe Flash Player to the latest version!
Login and enjoy all the member benefits

Login and enjoy all the member benefits

Login