S Switch High CPU Usage Troubleshooting ---- CPU Usage Overview and CPU Usage Working Mechanism Highlighted

Created: Aug 9, 2016 14:18:47Latest reply: Aug 10, 2016 23:31:35 2112 1 0 0

Knowledge About High CPU Usage 

 This section describes knowledge about high CPU usage on switches, including impact of high CPU usage, reason why CPU usage is high, fault locating methods, method of lowering CPU usage, and method of avoiding high CPU usage.

 

1 CPU and CPU Usage Overview

CPU - The Core of a Switch

A switch uses the distributed architecture, including forwarding and control planes. The forwarding plane implements Layer 2 and Layer 3 forwarding; the control plane implements forwarding control.

As shown in Figure 1-1, the control plane uses the universal embedded CPU and the forwarding plane uses forwarding chip:

l   The forwarding chip implements Layer 2 and Layer 3 forwarding, for example, updating the MAC address table for Layer 2 forwarding and Layer 3 forwarding table for IP forwarding. The forwarding chip implements data forwarding with a high throughput.

The CPU maintains software entries, such as routing and ARP entries, and configures the hardware Layer 3 forwarding table in chip based on the software forwarding entries. The CPU can also provide software-based Layer 3 forwarding. However, a disadvantage of CPU is that it has a low processing capability.

Figure 1-1 Distributed architecture

 

Packets on a network can be classified into control packets and data packets depending on their functions. If a switch does not have any hardware forwarding entry, the first packet reaching the switch is forwarded by the CPU and a Layer 3 forwarding hardware entry is created. The follow-up packets enter the forwarding chip through the inbound interface. Figure 1-2 shows this process.

Figure 1-2 Processing non-initial packets

 

l   Flow 1 (data packets) is sent out by the forwarding chip, and does not pass the CPU.

l   Flow 2 (control packets and a part of data packets) is forwarded to the CPU through the forwarding chip. The CPU determines whether to send the flow out or terminate it. Flow 2 consumes CPU resources, and cannot be forwarded in a high speed.

The Layer 2 and Layer 3 hardware entries in the forwarding chip determine whether a switch can implement high-speed forwarding; however, the hardware entries in the forwarding chip are created based on the software entries maintained in the CPU. Therefore, the CPU is the core of a switch.

CPU Usage

After a switch starts, the CPU runs more than 200 active tasks to manage the switch and monitor Layer 3 entry learning. The number of tasks may vary according to switch models. In addition, when more features are configured on a switch, more tasks run in the system.

CPU usage is the percentage of the amount of time a CPU spends processing non-idle tasks. It has the following characteristics:

l   Constantly changing: A switch's CPU usage keeps changing with system operations and changes of the environment.

l   Non-real-time: CPU usage data reflects CPU usage within a statistical period.

l   Entity-relevant: CPU usage is calculated based on physical CPU. Generally, each service board on a switch has an independent physical CPU. Therefore, the CPU usages of different boards are calculated separately.

A CPU usage reflects task running status at a specified time point. In Figure 1-3, task A occupies CPU resource for 10 ms, task B occupies CPU resource for 30 ms, and they stop for 60 ms. Then, task A occupies CPU resource for 10 ms, task B occupies CPU resource for 30 ms, and they stop for 60 ms. In this period, the CPU usage is 40%. A high CPU usage indicates that the switch is running many tasks.

Figure 1-3 Tasks occupy CPU resources

 

The CPU usage is a key indicator of switch performance.

 

 

2 CPU and CPU Usage Working Mechanism

2.1 How Does a CPU Process Packets (Modular Switch)

Huawei switches forward data packets through the forwarding chip without involving the CPU. The following packets will be sent to the CPU for processing on a switch:

l   Protocol packets to be terminated by the switch

All packets destined for the switch, including:

           Control packets of protocols, such as STP, LLDP, LNP, LACP, VCMP, DLDP, EFM, GVRP, and VRRP

           Route update packets of routing protocols, such as RIP, OSPF, BGP, and IS-IS

           SNMP, Telnet, SSH packets

           ARP and ND reply packets

l   Packets requiring special processing

           ICMP packets carrying options

           IPv6 packets with hop-by-hop option

           IPv4/IPv6 packets with a TTL value smaller than or equal to 1

           Packets with the switch's local IP address as the destination address

           ARP/ND/FIB Miss packets

l   Packets forwarded to the CPU by matching ACL

           Packets discarded by the deny action in ACL rules after the logging function is enabled

           Packets redirected to the CPU by traffic policies

l   Multicast-related packets

           PIM, IGMP, MLD, and MSDP protocol packets

           Unknown IP multicast packets

l   Packets related to other features

           DHCP packets

           ARP and ND broadcast request packets

           Layer 2 protocol packets forwarded through software by L2PT (Devices on two ends of a tunnel forward Layer 2 protocol packets through software, and intermediate devices forward these packets through chip.)

In Figure 2-1, multiple rate limiting operations are performed on the packets that are sent to the CPU of an MPU. For example, forwarding chips and SFU chips will limit the rate. The rate limiting ensures security of the MPU CPU.

Figure 2-1 Rate limiting for packets on a modular switch

 

In Figure 2-2, rate limiting on each chip or logic includes protocol-based rate limiting, queue-based rate limiting, and port-based rate limiting. The following provides default CPU rate limiting configuration on non-X1E LPUs of the S9300 running V200R007. To check the default CPU rate limiting configuration in other switch models and versions, run the display cpu-defend configuration all command.

Figure 2-2 Rate limiting types for packets to be sent to the CPU

 

Table 2-1 Protocol-based rate limits on the S9300

Packet Type

Rate Limit on LPU (in kbps)

Rate Limit on MPU (in kbps)

802.1x, arp-miss, mpls-ping, nd, nd-miss, loopbacktest, nd-redirect

64

64

smart-link, lacp, lldp, dldp, ttl-expired, mpls-ttl-expired, ntp, hw-tacacs, fib-miss, hgmp-bc, smlk-rrpp, hotlimit, mpls-vccv-ping, arp-request, arp-reply, arp-mff, vpls-arp

64

128

eoam-3ah, mpls-one-label

64

256

vpls-igmp, mpls-rsvp, ipmc-invalid, bpdu

64

512

vrrp, bgp4plus, vrrp6, hvrp, ssh, ftp, snmp, gvrp, eoam-1ag-lblt, pppoe, hopbyhop, hgmp-mc, hgmp-uc, nac-nd, nd-snp-rs, nd-snp-rans, nd-snp-na, mad, nac-arp

128

128

mpls-oam, igmp, pim, rip, telnet, tcp, fib-hit, rrpp, udp-helper

128

256

stp, mld, unknown-multicast, bpdu-tunnel, ipmc-miss

128

512

fib6-hit, mpls-fib-hit

128

1024

icmp

192

256

http, pimv6, icmpv6, easy-operation, eoam-1ag, heart-packet

256

256

isis, ospf, ospf-hello, bgp, bfd, mpls-ldp, ripng, ospfv3, nac-dhcp, vpls-dhcp-request, vpls-dhcp-reply, nac-dhcpv6, ospfv3-uc

256

512

dhcp-client, dhcpv6-request, dhcpv6-reply, radius, y1731

512

512

dhcp-server

512

1024

 

Table 2-2 CPU queues for different packets on an LPU (a larger queue ID indicates a higher forwarding priority)

Queue ID on LPU

Packet Type

Description

7

lacp

Fast protocol packets (Fast protocols have fast responses in interaction, for example, the response time of BFD is within 100 ms. The loss of a few packets will cause protocol flapping.)

6

vp

Packets sent from an LPU's CPU to the MPU's CPU

5

stp, smart-link, ldt, lldp, dldp, vrrp, mpls-oam, isis, pim, rip, ospf, ospf-hello, bgp, bfd, mpls-rsvp, mpls-ldp, mpls-ttl-expired, ntp, ripng, ospfv3, bgp4plus, pimv6, vrrp6, hvrp, telnet, ssh, mpls-ping, gvrp, bpdu-tunnel, rrpp, eoam-3ah, eoam-1ag, eoam-1ag-lblt, nd, y1731, mpls-one-label, loopbacktest, bpdu, nap, hgmp-mc, hgmp-uc, hgmp-bc, nd-redirect, nd-snp-rs, nd-snp-rans, nd-snp-na, mad, smlk-rrpp, ospfv3-uc

Important control plane protocol packet

4

other

-

3

arp-request, arp-reply, dhcp-client, dhcp-server, gmp, vpls-igmp, icmp, 8021x, http, dhcpv6-request, dhcpv6-reply, icmpv6, mld, ftp, snmp, radius, hw-tacacs, tcp, easy-operation, fib-hit, fib-miss, arp-miss, unknown-packet, udp-helper, arp-mff, pppoe, hopbyhop, mpls-vccv-ping, fib6-hit, nd-miss, nac-dhcp, vpls-arp, vpls-dhcp-request, vpls-dhcp-reply, nac-arp, icmp-ttl-expired, mpls-fib-hit, nac-nd, nac-dhcpv6, heart-packet

Important control plane protocol packet

2

ttl-expired, hotlimit

Secondary control plane protocol packets

1

unknown-multicast, ipmc-invalid, ipmc-miss

Secondary control plane protocol packets

0

other

-

 

Table 2-3 CPU queues for different packets on an MPU (a larger queue ID indicates a higher forwarding priority)

Queue ID on MPU

Packet Type

Description

7

lacp

Fast protocol packets (Fast protocols have fast responses in interaction, for example, the response time of BFD is within 100 ms. The loss of a few packets will cause protocol flapping.)

6

vp

Packets sent from an LPU's CPU to the MPU's CPU

5

stp, smart-link, ldt, lldp, dldp, vrrp, mpls-oam, isis, pim, rip, ospf, ospf-hello, bgp, bfd, mpls-rsvp, mpls-ldp, mpls-ttl-expired, ntp, ripng, ospfv3, bgp4plus, pimv6, vrrp6, hvrp, telnet, ssh, mpls-ping, gvrp, bpdu-tunnel, rrpp, eoam-3ah, eoam-1ag, eoam-1ag-lblt, nd, y1731, loopbacktest, bpdu, nap, hgmp-mc, hgmp-uc, hgmp-bc, nd-redirect, nd-snp-rs, nd-snp-rans, nd-snp-na, mad, smlk-rrpp, ospfv3-uc

Important control plane protocol packet

4

other

-

3

arp-request, arp-reply, dhcp-client, dhcp-server, gmp, vpls-igmp, icmp, 8021x, http, dhcpv6-request, dhcpv6-reply, icmpv6, mld, ftp, snmp, radius, hw-tacacs, tcp, easy-operation, fib-hit, fib-miss, arp-miss, unknown-packet, udp-helper, arp-mff, pppoe, hopbyhop, mpls-vccv-ping, fib6-hit, nd-miss, nac-dhcp, mpls-one-label, vpls-arp, vpls-dhcp-request, vpls-dhcp-reply, nac-arp, icmp-ttl-expired, mpls-fib-hit, nac-nd, nac-dhcpv6, heart-packet

Important control plane protocol packet

2

ttl-expired, hotlimit

Secondary control plane protocol packets

1

unknown-multicast, ipmc-invalid, ipmc-miss

Secondary control plane protocol packets

0

other

-

 

A switch determines into which CPU queues packets will be placed based on the packets' importance and plane (management, control, or forwarding plane). A CPU queue has a priority. For example, when both the Telnet management packets and dhcp-client protocol packets are sent to the CPU, the CPU first processes the Telnet management packets in queue 5. This mechanism ensures device stability and manageability under a heavy CPU load. The CPU can use a weighting mechanism to ensure that packets in low-priority queues can be processed. On a stable network, the number of packets sent to the CPU is limited within a specified range, and therefore the CPU usage remains within a proper range. If a large number of packets are sent to the CPU within a short period, the CPU is busy processing these packets, resulting in a high CPU usage.

 

2.2 How Does a CPU Process Packets (Fixed Switch)

Huawei switches forward data packets through hardware without involving the CPU. The following packets will be sent to the CPU for processing on a switch:

l   Protocol packets to be terminated by the switch

All packets destined for the switch, including:

           Control packets of protocols, such as STP, LLDP, LNP, LACP, VCMP, DLDP, EFM, GVRP, and VRRP

           Route update packets of routing protocols, such as RIP, OSPF, BGP, and IS-IS

           SNMP, Telnet, SSH packets

           ARP and ND reply packets

l   Packets requiring special processing

           ICMP packets carrying options

           IPv6 packets with hop-by-hop option

           IPv4/IPv6 packets with a TTL value less than or equal to 1

           Packets with the switch's local IP address as the destination address

           ARP/ND/FIB Miss packets

l   Packets processed using ACLs

           Packets discarded by the deny action in ACL rules after the logging function is enabled

           Packets redirected to the CPU by traffic policies

l   Multicast

           PIM, IGMP, MLD, and MSDP protocol packets

           Unknown IP multicast packets

l   Other features

           DHCP packets

           ARP and ND broadcast request packets as well as the ARP packets sent when dynamic ARP inspection is configured on a Layer 2 switch

           Layer 2 protocol packets forwarded through software by L2PT (Devices on two ends of a tunnel forward Layer 2 protocol packets through software, and intermediate devices forward these packets through hardware.)

           In N:1 VLAN mapping, the first packet is sent to the CPU, and other packets are forwarded by hardware.

A switch uses QoS mechanisms to prioritize packets sent to the CPU and ensure preferential processing of important packets. The switch groups different packets sent to the CPU into eight queues by priority. The types of packets sent to the CPU may vary in different switch models. Table 2-4 and Figure 2-3 lists typical packets that are sent to the CPU in the S5700LI. A larger queue ID indicates a higher priority.

Table 2-4 Queues for different packets sent to the CPU

Queue ID

Packet Type

Description

7

IPC, RPC, LACP

Internal management packet

6

VP

Internally forwarded protocol packet

5

Telnet, SSH, LNP, DHCP

Management plane protocol packets

4

ARP Request

Important control plane protocol packet

3

STP, SMLK, EOAM, VCMP

Important control plane protocol packet

2

LBDT, LLDP, DLDP, IGMP, ICMP, NTP, 802.1x, GVRP, L2PT, ARP Miss, FTP, SNMP

Secondary control plane protocol packet

1

Other

-

0

Other

-

 

Figure 2-3 Allocating packets of different types to CPU queues

 
 
 

A switch determines into which CPU queues packets will be placed based on the packets' importance and plane (management, control, or forwarding plane). A CPU queue has a priority. For example, when Telnet management packets and Layer 2 protocol packets transparently forwarded through L2PT are sent to the CPU, the CPU first processes the Telnet management packets in queue 5. This mechanism ensures device stability and manageability under a heavy CPU load. The CPU can use a weighting mechanism to ensure that packets in low-priority queues can be processed. On a stable network, the number of packets sent to the CPU is limited within a specified range, and therefore the CPU usage remains within a proper range. If a large number of packets are sent to the CPU within a short period, the CPU is busy processing these packets, resulting in a high CPU usage.

2.3 Impact of High CPU Usage

The CPU on a switch will be overloaded if the forwarding plane sends packets to the CPU at high speeds (for example, the CPU receives a large number of packets within a short time due to a loop on the network) or a task consumes CPU resources for a long time. When this occurs, the CPU may be unable to process other tasks in a timely manner, which may cause exceptions in services.

High CPU usage adversely affects the system processing capability and may result in the following network problems:

l   Nonresponse to management requests

           Failure to set up a Telnet or SSH session with the switch, causing a failure to manage the switch, slow response of the switch, or delay in command execution

           SNMP timeout

           Long delay or even timeout of MAC/IP ping operations

l   DHCP or 802.1X service failures caused by the switch's failure to forward or respond to requests from clients

l   Changes in the STP topology or even loops

A switch maintains root and alternate ports based on the BPDUs periodically received on its CPU. If the upstream device cannot send BPDUs in a timely manner because its CPU is busy or the switch's CPU is too busy to process received BPDUs, the switch considers the original path to the root bridge to have failed and selects a new root port, causing network reconvergence. If the switch also has an alternate port, the switch uses the alternate port as the new root port. In this situation, a loop may occur on the network.

l   Changes in the routing topology

Hello packets of dynamic routing protocols are processed by the CPU. If the CPU is too busy to process the received Hello packets or send Hello packets, route flapping occurs. For example, OSPF flapping, BGP flapping, or VRRP flapping may occur in this situation.

l   Flapping of reliability detection protocols

The CPU is responsible for keepalive of detection protocols such as 802.3ah, 802.1ag, DLDP, BFD, and MPLS OAM. If a busy CPU cannot transmit or receive protocol packets promptly, protocol flapping occurs, which affects service traffic forwarding.

l   LACP Eth-Trunk link flapping

LACP packets are processed by the CPU. If the CPU is too busy to receive and send LACP packets, the Eth-Trunk link will flap between Up and Down states.

l   Dropping of software forwarded packets or increasing delay in forwarding such packets

l   Memory usage of the switch increases.

2.4 Normal High CPU Usage Situations

A high CPU usage will cause service faults, for example, Border Gateway Protocol (BGP) route flapping, frequent Virtual Router Redundancy Protocol (VRRP) switchovers, or even user login failures. In some situations, a high CPU usage does not affect the network. For example, when a switch is reading optical transceiver information or traffic is bursting, the CPU usage may sharply increase. This is a normal and acceptable situation. Therefore, a high CPU usage may not be caused by faults. If a switch cannot process services for a long time, check whether a fault has occurred.

A high CPU usage resulting from the following events is normal and does not need to be handled. If the CPU usage can automatically restore to a normal range, you do not need to perform any operations.

l   Traffic bursts.

l   A board starts.

l   The switch reads information about multiple optical transceivers simultaneously.

l   The switch is calculating the spanning tree.

On a device running Multiple Spanning Tree Protocol (MSTP) network, the CPU usage is proportional to the number of instances and active ports. On a device running VLAN-based Spanning Tree (VBST), each VLAN runs an independent instance. Therefore, VBST uses more CPU resources than MSTP when they have the same number of VLANs and active ports.

l   The switch updates routing table in a large scale after receiving route update messages.

When a switch receives a route update message, the switch updates routing information and delivers it to the control plane, which consumes CPU resources. In a cluster/stack system, the switch also needs to synchronize routing information to other member switches.

During routing table update, the following factors affect the CPU usage:

           Number of entries in the routing table

           Update frequency

           Number of routing processes receiving the update messages

           Number of member switches in a cluster/stack

l   The switch is running copy cfcard:/ or output much debugging information.

l   The NMS frequently operates the switch.

l   Other events

           Fast MAC address learning on a port running the sticky MAC function

           Many ports are added to many VLANs (For example, a user performs configuration in a port group to add many ports to many VLANs or change link types of the ports.)

           The switch frequently receives a large number of IGMP request messages.

           The switch processes a large number of concurrent DHCP requests (For example, a switch that functions as a DHCP server restores connections with a large number of users.)

           ARP broadcast storm.

           Ethernet broadcast storm.

           Software forwarding of a large number of concurrent protocol packets (For example, L2PT transparently transmits a large number of BPDUs or the DHCP relay/snooping module forwards a large number of DHCP packets within a short time.)

           A large number of data packets cannot be forwarded through the forwarding chip and are sent to the CPU (such as ARP Miss).

           Ports alternate between Up and Down.

 

 

This article contains more resources

You need to log in to download or view. No account?Register

x
  • x
  • convention:

user_2790689     Created Aug 10, 2016 23:31:35 Helpful(0) Helpful(0)

S Switch High CPU Usage Troubleshooting ---- CPU Usage Overview and ...

Thank you for sharing.
  • x
  • convention:

Reply

Reply
You need to log in to reply to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " Privacy."
If the attachment button is not available, update the Adobe Flash Player to the latest version!
Fast reply Scroll to top