Hi Guys,
I would like to talk about an interesting case encounter on S2700 series.
From time to time, CPU-usage rise from 20-30% to 50-60% and stay there for 10-30 minutes. Spanning tree protocol didn't flap and also there were no other events spotted inside the LAN network. Service running through the network is not impacted, traffic forwarding is running at line speed. Well try to find together what is the root-cause.
Let's consider below topology, S2700 is located on access layer, connecting hosts and servers. Vlan 1, default vlan is passing through uplink ports as trunk and downlink as access ports. Also is the management vlan for all other switches in the network.
First thing to do is to check which task is keeping the CPU busy. I found soft_learn and frag_add at high level, both are used for mac-address learning process.
soft_learn 15% 0/3a62e52a tS0d
frag_add 11% 0/2c156c2a tS0e
On uplink there are a large number of mac-addresses on vlan 1
<LSW5>dis mac-address | include 0/0/1
-------------------------------------------------------------------------------
MAC Address VLAN/VSI Learned-From Type
-------------------------------------------------------------------------------
aaaa-aa0f-4e3e 1/- GE0/0/1 dynamic
bbbb-bb7a-4961 1/- GE0/0/1 dynamic
....................................................................................................
Total items displayed = 207
207 entries for only one VLAN. This is quite big for the low-end series S27.
Also comparing the entries at two different times we will see that many mac-addresses are aged out and also many are learned again. Because of large broadcast data coming through uplink interface, this port will continue learning process.
In order to avoid high CPU-usage spikes caused by learning process we can disable this function for uplink interface.
# Disable MAC address learning for Gi0/0/1.
<Quidway> system-view
[Quidway] interface gi0/0/1
[Quidway-gigabitethernet0/0/1] mac-address learning disable
But this will generate some extra traffic into the network. Without mac-address learning function, data forwarding will be made in broadcast mode. So unwanted traffic will be flooded inside vlan.
For instance, let consider that in normal situation, from uplink switch will receive 30Mbps traffic, and forward to downlink to both host with 15Mbps rates.
Ethernet0/0/1 up up 10% 15% 0 0
Ethernet0/0/2 up up 10% 15% 0 0
GigabitEthernet0/0/1 up up 3.01% 2.01% 0 0
After we disable mac-address learning function traffic statistics will look like below:
Interface PHY Protocol InUti OutUti inErrors outErrors
Ethernet0/0/1 up up 10% 25% 0 0
Ethernet0/0/2 up up 10% 25% 0 0
GigabitEthernet0/0/1 up up 3.01% 2.01% 0 0
and mac-address table we’ll have only mac-address learned from eth0/0/1 and eth0/0/2.
[Quidway]dis mac-address
-------------------------------------------------------------------------------
MAC Address VLAN/VSI Learned-From Type
-------------------------------------------------------------------------------
0000-0000-0001 1/- ETH0/0/1 dynamic
0000-0000-0002 1/- ETH0/0/2 dynamic
So Ethernet0/0/1 and Ethernet0/0/2 has initially 15Mbps outbound traffic. At the same time, each downlink will add 10Mbps broadcast traffic input (S27 will broadcast unicast packets if it cannot find output port or mac table). Then, each downlink output traffic is 15M+10M=25M.
A big price to solve CPU-usage high load.
Download broadcast traffic can be isolated with port-isolation function. If we will add all downlink interfaces in the same isolation group, it will not be able to communicate each other, moreover broadcast traffic will not reach to other interface inside isolation group.
Conclusion:
1. “Mac-address learning disable” on uplink and “port-isolation” on downlink will not modify the traffic path and characteristic but will solve CPU-usage spikes generated by mac-address learning process.
2. Layer 2 forwarding consist on broadcast traffic from S2700 to core network and unicast traffic from core towards S2700 based on core switches mac address table.
Hope to enjoy reading this case.