S Switch High CPU Usage Troubleshooting ---- High CPU Usage Typical Cases Highlighted

Created: Aug 27, 2016 10:25:58Latest reply: Aug 27, 2016 17:48:16 5113 1 1 0

6High CPU Usage Typical Cases

6.1 A Switch Suffers a Multicast Packet Attack

Symptom

The switch running multicast service has a high CPU usage, and many forwarding entries of multicast group 239.255.255.250 exist on the switch, occupying many forwarding entry resources. However, in actual multicast deployment, this multicast group does not exist.

Example: After the IPTV service is enabled on a subnet on a carrier's network, the switch on the subnet creates a large number of multicast routing entries in which the group address is 239.255.255.250 and source addresses are IP addresses of set top boxes (STBs) from a specific vendor. These multicast entries are spread to other user subnets, so devices on the network all have a large number of such entries.

Root Cause

Group address 239.255.255.250 is used by the Simple Service Discovery Protocol (SSDP). Therefore, when SSDP is enabled on any servers or PCs, the servers or PCs send multicast packets with group address 239.255.255.250 to the switch.

239.255.255.250 is not a permanent multicast group address. (A permanent multicast group address is also called reserved address, which identifies a group of network devices. It is used for routing protocols and topology searching, but not used for multicast forwarding.) The switch treats the multicast group addresses within the range of 224.0.0.X as ordinary multicast groups. Therefore, the switch generates the corresponding multicast forwarding entries.

STBs from this vendor have the SSDB service enabled by default and send SSDP Discover messages to the source DR, which then creates multicast forwarding entries with group address 239.255.255.250. After the STBs register with the RP successfully, hosts on other subnets send Report messages with group address 239.255.255.250. Finally, switches on these subnets create a large number of multicast entries with this group address and different source IP addresses.

Identification Method

1.         Run the display cpu-usage command to view the CPU usage of the switch. The CPU usage is above 80%. Check the top tasks. The tasks bcmRx/FTS/VPR/SOCK have high CPU usage.

2.         Run the display cpu-defend statistics command to view statistics about the packets sent to the CPU, determining whether too many IGMP packets exist.

a.         Run the reset cpu-defend statistics command to clear statistics about the packets sent to the CPU.

b.         Run the display cpu-defend statistics packet-type igmp all command to view statistics about IGMP packets sent to the CPU.

20160827102047867001.png

3.         Find out the multicast attack source.

Choose a method:

           Obtain packet information through port mirroring.

Port mirroring collects packet information without increasing the CPU load. It is recommended that you configure port mirroring on the inbound interface of the packets sent to the CPU. For the configuration of port mirroring, see Mirroring Configuration in the Configuration Guide - Network Management and Monitoring.

           View multicast entries.

n   After Layer 2 multicast is configured, the display igmp-snooping port-info command output shows that multiple host ports have received Report messages with group address 239.255.255.250.

After Layer 3 multicast is configured, the display multicast forwarding-table command output shows that the switch has created many multicast forwarding entries with different source addresses and group address 239.255.255.250.

           Configure the local attack defense policy based on attack source tracing.

20160827102048528002.png

Run the display auto-defend attack-source and display auto-defend attack-source slot slot-id commands to view attack source information on MPUs and LPUs.

4.         According to the preceding information, it is confirmed that the switch suffers an attack from multicast group 239.255.255.250.

Solution

There are two solutions: 1. Filter out the packets from this multicast group recommended; 2. Disable SSDP on the server or PC where the attack source resides.

l   Filter out the packets from this multicast group.

a.         Filter out the IGMP packets from 239.255.255.250.

# Configure an advanced ACL that denies packets from multicast group 239.255.255.250 and permits the packets from other IP addresses.

20160827102049936003.png

# Configure an advanced ACL to filter out the packets from multicast group 239.255.255.250.

20160827102049882004.png

# Filter out Layer 3 multicast packets.

20160827102050893005.png

# Filter Layer 2 multicast packets.

20160827102051524006.png

# Configure a blacklist to filter IGMP packets from multicast group 239.255.255.250 to avoid multicast forwarding entries for this group.

20160827102052769007.png

b.         Filter out multicast data packets with group address 239.255.255.250.

# Configure a traffic classifier that matches destination IP address 239.255.255.250.

20160827102052806008.png

# Configure a traffic behavior.

20160827102053010009.png

# Configure a traffic policy.

20160827102053100010.png

# Apply the traffic policy to incoming packets on the interface.

20160827102054680011.png

l   Disable the SSDP service on the server or PC.

a.         On the Control Panel , click the Administrative Tools icon, and then click the Services icon.

b.         Find out SSDP Discovery Service in the service list and disable the service.

Conclusion

Group address 239.255.255.250 is used by the SSDP service, which is enabled by default on Windows servers. Therefore, multicast devices will create forwarding entries for this group.

The switch considers this group as an ordinary multicast group. If the switch has a high CPU usage and an attack is initiated from 239.255.255.250, which is not a planned IP address., configure packet filtering on the switch or disable the SSDP service on the PC, to prevent the switch from generating a large number of multicast forwarding entries.

Relevant Information

The Simple Service Discovery Protocol (SSDP) is an application-layer protocol, one of key protocols that implement Universal Plug and Play (UPnP). The SSDP protocol enables network clients to discover network services by sending multicast discovery messages.

The SSDP protocol uses multicast IPv4 address 239.255.255.250:1900 or multicast IPv6 address FF0x::C to transmit messages.

When a network client connects to a network, the client sends an SSDP Discovery message with a specific multicast group address and the SSDP port in M-Search mode. When an upstream device receives the Discovery message, it checks whether it provides the service required by the client. It so, the device sends a unicast response message to the client.

Figure 6-1 and Figure 6-2 show information about UDP datagram of SSDP and IGMP Report message.

Figure 6-1 SSDP UDP packets

20160827102055160012.png

 

Figure 6-2 SSDP IGMP Report packets

20160827102055668013.png

 

6.2 A Switch Suffers an ARP Packet Attack

Symptom

In Figure 6-3, Switch functions as a gateway, Switch_1 (modular switch) is frequently out of management, and users on Switch_1 are frequently disconnected. There is a delay when Switch_1 pings the Switch or the ping operation fails. Services on Switch_2 are normal, and Switch_2 can successfully ping the gateway.

Figure 6-3 Networking diagram

20160827102056248014.png

 

Root Cause

Switch_1 receives ARP packets with fixed source MAC address. User devices cannot send or receive ARP packets.

Identification Method

Perform the following operations on Switch_1:

                               Step 1     Check whether the CPU usage is high.

20160827102056968015.png

The CPU usage reaches 82%.

                               Step 2     View temporary ARP entries to check whether ARP learning is normal.

20160827102057017016.png

The MAC ADDRESS fields of two ARP entries are Incomplete, indicating temporary entries. Some ARP entries cannot be learned.

                               Step 3     Check whether the switch is suffering an ARP attack.

1.         View statistics about ARP request packets sent to the CPU.

20160827102058841017.png

There are a large number of ARP request packets on the board in slot 4.

2.         Configure attack source tracing to identify the attack source.

20160827102058115018.png

3.         View attack source information.

20160827102059314019.png

The MAC address of attack source is 0000-0000-00db, which is connected to GigabitEthernet2/0/22.

If the MAC address has a matching ARP entry, run the display arp | include 0000-0000-00db command to check its IP address.

----End

Solution

l   Configure a blacklist.

20160827102059995020.png

l   Configure the attack source tracing action.

20160827102100994021.png

 

6.3 STP Flapping Causes a High CPU Usage

Symptom

A fixed switch has a high CPU usage, and generates many logs about ARP packets that are discarded because their rate exceeds the CPCAR value. The interface information shows that the number of TC BPDUs received by STP-enabled interfaces keeps increasing.

Root Cause

An interface has received a large number of TC BPDUs, causing STP flapping. Many MAC entries are deleted and ARP entries are updated. Therefore, the switch needs to process many ARP Miss, ARP request, and ARP reply packets, causing a high CPU usage.

Identification Method

1.         Logs indicating a high CPU usage are generated on the switch.

20160827102100282022.png

2.         There are also logs indicating that a large number of ARP packets are discarded by CPCAR.

20160827102101119023.png

3.         Collect statistics about transmitted and received TC BPDUs on interfaces.

Run the display stp tc-bpdu statistics command at an interval of several seconds. Check the statistics about sent and received TC/TCN BPDUs. It is found that the number of TC BPDUs on all STP-enabled interfaces keeps increasing.

Solution

1.         Run the stp tc-protection command in the system view to enable TC protection trap.

After TC protection trap is enabled, the switch updates entries at most once within 2 seconds if it frequently receives TC BPDUs. This reduces the number of tasks to be processed by the CPU in frequently updating MAC and ARP entries.

The switch will trigger the MSTP_1.3.6.1.4.1.2011.5.25.42.4.2.15 hwMstpiTcGuarded and MSTP_1.3.6.1.4.1.2011.5.25.42.4.2.16 hwMstpProTcGuarded traps.

2.         Run the arp topology-change disable command in the system view to disable the switch from responding to TC BPDUs.

After receiving TC BPDUs, the switch ages out ARP entries by default. After this command is executed, the switch does not age out or delete ARP entries when receiving TC BPDUs. When the network topology changes frequently, this prevents excessive ARP packets caused by ARP relearning and high CPU usage.

3.         Run the mac-address update arp command in the system view to enable ARP entry update upon MAC address change.

By default, the switch deletes the MAC address entries after receiving TC BPDUs. After this command is executed, the switch updates the outbound interfaces in ARP entries when the outbound interfaces in MAC entries are changed. This reduces the number of ARP entry update times.

Conclusion

When this problem occurs, check packet loss caused by CPCAR.

When deploying STP, you are advised to enable TC protection and configure all ports connected to terminals as edge ports. These measures prevent status change of an interface from causing flapping and re-convergence of the entire STP network.

 

6.4 OSPF Flapping Causes a High CPU Usage

Symptom

In Figure 6-4, OSPF is run on Switch_1, Switch_2, Switch_3, and Switch_4. Switch_1 has a high CPU usage. The CPU usage of the ROUT task is higher than the CPU usage of other tasks, and route flapping occurs.

Figure 6-4 Networking diagram

20160827102102769024.png 

Root Cause

IP address conflict on the network causes route flapping.

Identification Method

                               Step 1     Run the display ospf lsdb command on each switch at an interval of one second to check information about the OSPF link state database (LSDB) on the switches.

                               Step 2     Locate the fault based on the collected command output of each switch.

l   If both the following situations occur, LSA aging is abnormal.

           The Age value that indicates the aging time of a network LSA is 3600 on a switch or the switch does not have the network LSA, and the Sequence value increases quickly.

           The Age value of the same network LSA on different switches frequently alternates between 3600 and smaller values, and the Sequence value increases quickly.

20160827102103418025.png

a.         Run the display ospf routing command on each switch every 1 second. If route flapping occurs and the OSPF neighbor relationship does not flap, IP address conflicts or router ID conflicts occur. The IP address of the designated router (DR) conflicts with that of a non-DR based on the display ospf lsdb command output.

b.         Locate one conflicting interface on a switch based on the AdvRouter value, and locate the other conflicting device based on the IP address plan. It is difficult to locate the other conflicting device based only on OSPF information.

In this example, first determine that the conflicting IP address is 112.1.1.2, and the router ID of a conflicting device is 1.1.1.1. However, the other conflicting device (3.3.3.3) cannot be located through OSPF information.

l   If the LinkState ID values of two network LSAs are both 112.1.1.2 on a switch, the aging time of the two network LSAs is short, and the Sequence value increases quickly, an IP address conflict occurs on the DR and BDR.

20160827102103749026.png

----End

Solution

Change the IP address of a conflicting device based on the IP address plan.

Conclusion

l   The following problems may occur due to IP address conflicts on networks.

           The CPU usage is high. To check the CPU usage, run the display cpu-usage command. The command output shows that the ROUT task consumes much more CPU resources than other tasks.

           Route flapping occurs.

l   On an OSPF network, IP address conflicts between interfaces may cause frequent aging and generation of LSAs. This results in network instability, route flapping, and high CPU usage.

Configure IP addresses for interfaces according to network plan, and do not modify planned network parameters.

 

6.5 Many Multicast Packets Are Sent to the CPU Due to a Loop

Symptom

A modular switch provides the High Speed Internet (HSI), VoIP, and IPTV services to connected users. The HSI and VoIP are PPPoE services, and IPTV is the IGMP snooping service of Layer 2 multicast.

The administrator detects that the inbound traffic volume exceeds 90% of the bandwidth, and the CPU usages of MPU and LPU exceed 80%.

Root Cause

STP is not enabled on the access device connected to the switch, and a loop occurs. Many IGMP packets are sent to the CPU to overwhelm the CPU. As a result, EFM packet interaction times out, and EFM service between the switch and other switches is interrupted. The ports recalculate MSTP, affecting services.

Identification Method

Perform the following operations on a modular switch:

1.         Run the display cpu-usage command to check the CPU usage. The CPU usages on the MPUs of active/standby switches exceed 87%, and the CPU usages on the LPUs exceed 93%.

2.         Display traps on the switch.

a.         The trap for bandwidth exceeding on interfaces is reported.

20160827102104787027.png

b.         EFM flapping causes a root bridge loss.

20160827102104857028.png

3.         Run the display interface command. The interface processes too many multicast packets.

4.         Run the display cpu-defend statistics all command. There are many IGMP packets, indicating that a large number of IGMP packets have been sent to the switch's CPU.

20160827102105159029.png

5.         Mirror packets on the interfaces that process many multicast packets. It is found that the multicast packets are sent from a specified address.

Solution

1.         Configure blacklist-based local attack defense policy to filter out IGMP packets. This reduces impact of multicast packets on the CPU after a loop occurs.

2.         Remove the loop.

 

6.6 A Loop Causes a High CPU Usage

Symptom

After network restructuring and migration, the original core devices (Layer 3 devices) are re-deployed as access devices (Layer 2 devices). In Figure 6-5, after the network migration is complete, the core devices ping the management IP addresses at the access layer. The ping operations intermittently fail, and the core devices report traps for frequent VRRP status changes.

Figure 6-5 Networking diagram

20160827102106086030.png

 

Switch_1 displays the following messages:

20160827102106563031.png

Root Cause

A loop exists on the network.

Identification Method

                               Step 1     Display VRRP group status.

20160827102107007032.png

Switch_1 functions as the backup in the VRRP group and works normally.

                               Step 2     Check the statistics about VRRP packets sent to the CPU.

20160827102108017033.png

There are a large number of discarded packets in slot 4 of Switch_1.

                               Step 3     Check traffic statistics on interfaces.

20160827102108759034.png

The average bandwidth on GigabitEthernet4/0/19 exceeds 80% of the interface bandwidth. There is a high probability that a loop exists. In addition, incoming traffic volume on GigabitEthernet4/0/18 and GigabitEthernet4/0/19 also exceeds 80%. This may be caused by a loop on the devices connected to the two interfaces.

                               Step 4     Shut down GigabitEthernet4/0/18 and GigabitEthernet4/0/19 that have overloaded traffic and check statistics about VRRP packets sent to the CPU. The number of discarded VRRP packets is not increased. The management IP addresses at the access layer can be pinged.

                               Step 5     Interfaces GigabitEthernet4/0/18 and GigabitEthernet4/0/19 connect to access devices. Both are non-Huawei and Layer 3 devices, where STP is disabled. When the two devices are re-deployed as Layer 2 devices, the command for enabling STP is not configured, resulting in the loop.

----End

Solution

1.         Enable STP on the access devices connected to GigabitEthernet4/0/18 and GigabitEthernet4/0/19.

2.         Run undo shutdown on GigabitEthernet4/0/18 and GigabitEthernet4/0/19 to check STP status and interface traffic statistics. Service is recovered.

Conclusion

When the network traffic is unstable, check the traffic on interfaces to determine whether loops occur. If a loop occurs, locate the source based on the information about packets received and sent by the interfaces. Shut down related interfaces temporarily. Find out the root cause and resolve the problem accordingly.


7How to Relieve CPU Load

1.         Plan the network configurations, configure loop prevent protocol, and enable loop detection to prevent loops.

Run the loopback-detect untagged mac-address ffff-ffff-ffff command in the system view to broadcast BPDUs for loop detection and prevent them from being terminated by unexpected devices.

Run the loopback-detect enable command in the interface view to enable loop detection.

2.         Configure ARP security to protect the device against ARP or ARP Miss attacks.

For details about ARP security, see ARP Security Solutions in section ARP Security Configuration in the Configuration Guide - Security.

3.         On the network prone to DHCP and ARP attacks, such as campus networks, configure local attack defense policies for DHCP and ARP protocol packets.

This section provides suggestions on local attack defense policies in general situations. The requirements on different protocol packets sent to the CPU may vary according to the model and version. In practice, configure CPU attack defense based on service requirements; otherwise, the configuration may fail or services may be affected.

           Control board on modular switch

20160827102109947035.png

           Interface card on modular switch

20160827102109576036.png

           Fixed switches

20160827102110583037.png

4.         Log in to the switch as an administrator through SSH, Telnet, and SNMP. Configure an ACL to allow only the administrator to log in.

# In VTY 0-14, configure the ACL to allow only the user with source IP address 10.1.1.1/32 to log in to the switch.

20160827102110049038.png

5.         When a port group has more than 40 member ports and you add these member ports to 4K VLANs at the same time, the CPU usage may jump to over 80% in a short period. Therefore, you are advised to add the member ports to no more than 500 VLANs at a time.

6.         Changing the type of more than 20 ports together may cause a CPU usage of over 80% in a short period. Therefore, you are advised to change the type of ports one by one.

7.         Frequent MAC address flapping may result in a high CPU usage. If MAC address flapping may occur frequently on an interface, run the mac-address flapping action error-down command on the interface to enable the system to set the interface to error-down state after detecting a MAC address flapping.

8.         When the total number of VLANs on the interfaces with loopback detection enabled exceeds 1024 VLANs, run the loopback-detect action shutdown command on these interfaces to set the action for a detected loopback to shutdown. (The VLAN counter increases by 1 every time an interface is added to a VLAN, even when multiple interfaces are added to the same VLAN.)

9.         Load and activate the patch files of the corresponding software version.

Visit http://support.huawei.com/enterprise/ to obtain the corresponding patch file and documents (patch release notes and installation guide).

10.      Scan virus on the PCs or servers connected to the switch periodically.

11.      The switch provides CPCAR values for each protocol. Generally, the default CPCAR values can meet requirements. If service traffic volume is too high, contact Huawei switch agents to adjust the CPCAR values.

 
This post was last edited by 交换机在江湖 at 2016-08-27 11:18.

This article contains more resources

You need to log in to download or view. No account?Register

x
  • x
  • convention:

user_2790689     Created Aug 27, 2016 17:48:16 Helpful(0) Helpful(0)

Thank you.
  • x
  • convention:

Reply

Reply
You need to log in to reply to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " Privacy."
If the attachment button is not available, update the Adobe Flash Player to the latest version!

Login and enjoy all the member benefits

Login
Fast reply Scroll to top