Hi, everyone!
This post is about ping failure troubleshooting (2). More details to be found below.
1.1. Troubleshooting Cases
1.1.1 A Ping Fails Because the ICMP Packet Contains an Incorrect Checksum
Fault Symptom
A switch functions as a gateway to connect to terminals such as access control system and PCs. The switch fails to ping a terminal.
Cause Analysis
The ICMP Echo Reply packets returned by the peer device carry incorrect checksum. Protocol check fails, causing a ping failure.
Troubleshooting Procedure
1. Method 1:
When a switch correctly learns ARP entries, collect traffic statistics to check whether ICMP Echo Request and Reply packets are sent and received by the switch. Alternatively, you can capture packets:
![]()
Figure 1-1 Packet capturing
The captured packet information shows that the ICMP Reply packets have an incorrect checksum. incorrect in Figure 1-3 indicates an incorrect checksum.
2. Method 2:
Run the display icmp statistics command before and after the ping operation to view the bad checksum field. Check whether the number of checksum error packets at the ICMP protocol layer keeps increasing.
<HUAWEI> display icmp
statistics
Input: bad formats
0 bad checksum
3
echo
8 destination unreachable
0
source
quench
0
redirects
0
echo
reply
0 parameter
problem 0
timestamp request
0 information
request 0
mask
requests
0 mask
replies
0
time
exceeded
0 timestamp
reply
0
Mping
request
0 Mping
reply
0
Output:echo
0 destination unreachable
0
source
quench
0
redirects
0
echo reply
8
parameter problem 0
timestamp request
0 information reply
0
mask
requests
0 mask
replies
0
time
exceeded
0 timestamp
reply 0
Mping
request
0 Mping
reply 0
The preceding information reflects an increasing number of checksum error packets.
Solution
Check whether the ICMP packets returned by the protocol stack on the peer device have a correct format.
1.1.2 Directly Connected Devices Cannot Ping Each Other Because of an Incorrect Static ARP Entry
Fault Symptom
The customer replaces a device on the live network with switch A. Figure 1-4 shows the new network. Switch A and switch B cannot ping each other, and the OSPF neighbor status on switch A is Exchange. After switch A is replaced by the original device, the fault is rectified.
![]()
Figure 1-2 Directly connected devices cannot ping each other
Cause Analysis
1. The original device can ping switch B, indicating that the link between the two devices functions properly. Switch A and switch B are directly connected, so the fault is not caused by routing problems. The fault may be caused by errors in ARP learning.
2. Run the display arp all command on switch A to check whether switch A has learned the ARP entry of switch B.
<SwitchA> display
arp all
IP ADDRESS MAC ADDRESS EXPIRE(M) TYPE
INTERFACE VPN-INSTANCE
VLAN/CEVLAN
------------------------------------------------------------------------------
1.1.1.1 0025-9e80-2494 I
- Vlanif20
1.1.1.2 0025-9e80-248e
18 D-0 GE1/0/1
33
------------------------------------------------------------------------------
Total:2
Dynamic:1 Static:0 Interface:1
The preceding information shows that switch A has learned the ARP entry of switch B.
3. Run the display arp all command on switch B to check whether switch B has learned the ARP entry of switch A.
<SwitchB> display
arp all
IP ADDRESS MAC ADDRESS EXPIRE(M) TYPE
INTERFACE VPN-INSTANCE
VLAN/CEVLAN
------------------------------------------------------------------------------
1.1.1.2
0025-9e80-248e I -
Vlanif20
1.1.1.1 0016-ecb9-0eb2 S--
GE1/0/1
33
------------------------------------------------------------------------------
Total:2 Dynamic:0
Static:1 Interface:1
In the ARP table, IP address 1.1.1.1 maps MAC address 0016-ecb9-0eb2. The ARP entry type is S, indicating a static ARP entry. In the ARP tables of the two switches, IP address 1.1.1.1 maps different MAC addresses.
The static ARP entry (IP + MAC + port number) was configured on switch B before the network adjustment, and was not updated after the network adjustment; therefore, switch A cannot ping switch B.
Troubleshooting Procedure
1. Run the system-view command on switch B to enter the system view.
2. Run the undo arp static ip-address command to delete the incorrect static ARP entry.
![]()
After the static ARP entry is deleted, switch A can ping switch B. A new static ARP entry needs to be configured to prevent ARP attacks.
3. Run the arp static ip-address mac-address vid vlan-id interface interface-type interface-number command to configure the correct static ARP entry.
After the preceding configurations, switch A can successfully ping switch B. Run the display ospf peer command to check the status of the OSPF neighbor. The OSPF neighbor is in Full state.
<SwitchA> display
ospf peer
OSPF Process 1 with Router ID
11.11.11.105
Neighbors
Area 0.0.0.0 interface 1.1.1.1(Vlanif33)'s
neighbors
Router ID: 2.1.1.1.168.10.2 Address:
1.1.1.2
State: Full Mode:Nbr is Master
Priority:
1
DR: 1.1.1.2 BDR: 2.1.1.1 MTU:
0
Dead timer due in 34
sec
Retrans timer interval: 8
Neighbor is up for
00:28:17
Authentication Sequence: [ 0
]
Summary
If a static ARP entry is configured on a device, modify the ARP entry after the MAC address changes. If switch B is a non-Huawei device and you cannot log in to switch B to check the configuration, ping switch B from switch A and configure the mirroring function to analyze packets transmitted between switch A and switch B. Check whether the destination MAC addresses of the packets are correct.
1.1.3 A Switch Can Be Pinged But Cannot Be Remotely Accessed
Fault Symptom
In Figure 1-5, switch C can ping VLANIF 20 of switch A, but switch C cannot access switch A using Telnet.
![]()
Figure 1-3 A switch can be pinged but cannot be remotely accessed
Cause Analysis
1. The switch supports the fast ICMP reply function. This function enables the switch to quickly respond to the ICMP echo request packet destined for its own IP address.
This problem may be caused by the fast ICMP reply function on switch A. If fast ICMP reply is enabled on switch A, switch A can quickly respond to ICMP request packets even if switch A does not have a route destined for 2.1.1.1. Switch C can successfully ping switch A, indicating that the link between switch C and switch A is normal, but the route may be abnormal. Therefore, you need to check whether there is a reachable route from switch C to switch A.
2. Run the tracert 1.1.1.1 command on switch C to check routes from switch C to switch A.
<SwitchC> tracert
1.1.1.1
traceroute to 1.1.1.1(1.1.1.1), max hops: 30 ,packet length: 40
1 2.1.1.2 10 ms 1 ms 1 ms
2 * * *
The preceding information shows that there is a reachable route from switch C to switch B, but no reachable route from switch C to switch A. The possible cause is that the route to 2.1.1.1 is not configured on switch A or is configured incorrectly.
3. Run the telnet 2.1.1.2 command on switch C to log in to switch B, and run the telnet 1.1.1.1 command on switch B to log in to switch A. The Telnet operations are successful, indicating that the Telnet configuration on switch A is correct.
4. Run the display ip routing-table 2.1.1.1 command on switch A to check the routing table. In the routing table, the longest match entry corresponding to destination IP address 2.1.1.1 is empty. Run the undo icmp-reply fast command on switch A to disable the fast ICMP reply function. Switch C fails to ping switch A.
In a conclusion, switch C can ping switch A because the fast ICMP reply function is enabled on switch A. Switch C fails to ping switch A because switch A does not have a route to 2.1.1.1.
Troubleshooting Procedure
1. Run the system-view command on switch C to enter the system view.
2. Run the ip route-static 2.1.1.0 255.255.255.0 1.1.1.2 command to configure a static route to 1.1.1.2.
Then switch C can access switch A using Telnet.
1.1.4 A Switch Undergoes an ARP Attack and Cannot Be Pinged
Fault Symptom
In Figure 1-6, Switch functions as a gateway, Switch_1 (modular switch) is frequently out of management, and users on Switch_1 are frequently disconnected. There is a delay when Switch_1 pings the Switch or the ping operation fails. Services on Switch_2 are normal, and Switch_2 can successfully ping the gateway.
![]()
Figure 1-4 A switch undergoes an ARP attack and cannot be pinged
Cause Analysis
Switch_1 receives ARP packets with a fixed source MAC address. User devices cannot send or receive ARP packets.
Troubleshooting Procedure
Perform the following operations on Switch_1:
1. Check whether the CPU usage is high.
<Switch_1> display
cpu-usage
CPU Usage Stat. Cycle: 10 (Second)
CPU Usage : 82% Max:
99%
CPU Usage Stat. Time : 2010-12-18 15:35:56
CPU utilization for five seconds: 68%: one minute: 60%: five minutes: 55%.
The CPU usage reaches 82%.
2. View temporary ARP entries to check whether ARP learning is normal.
<Switch_1> display
arp
IP ADDRESS MAC ADDRESS EXPIRE(M) TYPE VPN-INSTANCE
INTERFACE
VLAN/CEVLAN
------------------------------------------------------------------------------------------------------
10.137.222.139
00e0-fc01-4422
I - Eth0/0/0
10.137.222.1 0025-9e36-e8c1
20
D-0 Eth0/0/0
10.137.222.100 0025-9e80-b278
6
D-0 Eth0/0/0
10.137.222.99 00e0-4c77-b0e1
9
D-0 Eth0/0/0
10.137.222.173 000f-3d80-cba4
18
D-0 Eth0/0/0
10.137.222.34 0025-9e36-e8c1
1
D-0 Eth0/0/0
10.137.222.172 0016-ec71-ea8c
7
D-0 Eth0/0/0
10.137.222.35 0025-9e36-e8c1
18
D-0 Eth0/0/0
10.137.222.179 0014-2ae2-3128
20
D-0 Eth0/0/0
10.137.222.38 0025-9e36-e8c1
17
D-0 Eth0/0/0
10.137.222.175 0014-2261-2b22
1
D-0 Eth0/0/0
50.1.1.3 Incomplete
1
D-0 GE5/0/0
500/-
50.1.1.2 Incomplete
1
D-0 GE5/0/0
500/-
6.1.1.2
00e0-fc01-4422
I - Vlanif6
10.0.0.139 00e0-fc01-4422
I - Vlanif10
192.0.0.4
00e0-fc01-4422
I - Vlanif192
20.1.1.1
00e0-fc01-4422
I - Vlanif200
192.168.2.2
00e0-fc01-4422
I - Vlanif100
------------------------------------------------------------------------------------------------------
Total:16
Dynamic:10 Static:0 Interface:6
The MAC ADDRESS fields of two ARP entries are Incomplete, indicating temporary entries. Some ARP entries cannot be learned.
3. Check whether the switch is suffering an ARP attack.
a. View statistics about ARP request packets sent to the CPU.
<Switch_1>display
cpu-defend arp-request statistics all
Statistics on mainboard:
------------------------------------------------------------------------------------------------------------------
Packet Type
Pass(Bytes) Drop(Bytes)
Pass(Packets) Drop(Packets)
-----------------------------------------------------------------------------------------------------------------
arp-request
67908288
0
1061067
0
------------------------------------------------------------------------------------------------------------------
Statistics on slot 4:
------------------------------------------------------------------------------------------------------------------
Packet Type
Pass(Bytes) Drop(Bytes)
Pass(Packets) Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request
80928 44380928
2301 693450
------------------------------------------------------------------------------------------------------------------
Statistics on slot 5:
------------------------------------------------------------------------------------------------------------------
Packet Type
Pass(Bytes) Drop(Bytes)
Pass(Packets) Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request
N/A
N/A
0
0
------------------------------------------------------------------------------------------------------------------
Statistics on slot 6:
------------------------------------------------------------------------------------------------------------------
Packet Type
Pass(Bytes) Drop(Bytes)
Pass(Packets) Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request
N/A
N/A
0
0
------------------------------------------------------------------------------------------------------------------
There are a large number of ARP request packets on the board in slot 4.
b. Configure attack source tracing to identify the attack source.
<Switch_1>system-view
[Switch_1]cpu-defend policy policy1
[Switch_1-cpu-defend-policy-policy1]auto-defend enable
[Switch_1-cpu-defend-policy-policy1]auto-defend attack-packet sample 5
//One packet is sampled out of five sent packets. A
small sampling rate will consume many CPU resources.
[Switch_1-cpu-defend-policy-policy1]auto-defend threshold 30
//The packets of which the rate reaches 30 pps are
considered attack packets. If there are many attack sources, reduce this value.
[Switch_1-cpu-defend-policy-policy1]undo auto-defend trace-type source-ip
source-portvlan
//Identify the attack source based on source MAC
address.
[Switch_1-cpu-defend-policy-policy1]undo auto-defend protocol 8021x dhcp
icmp igmp tcp telnet ttl-expired udp
//Identify the attack source of the ARP attack.
[Switch_1-cpu-defend-policy-policy1]quit
[Switch_1]cpu-defend-policy policy1
[Switch_1]cpu-defend-policy policy1 global
c. View attack source information.
[Switch_1]display
auto-defend attack-source
Attack Source User Table (MPU):
------------------------------------------------------------------------------------------------
MacAddress
InterfaceName
Vlan:Outer/Inner TOTAL
------------------------------------------------------------------------------------------------
0000-0000-00db
GigabitEthernet2/0/22
193 416
------------------------------------------------------------------------------------------------
The MAC address of attack source is 0000-0000-00db, which is connected to GigabitEthernet2/0/22.
If the MAC address has a matching ARP entry, run the display arp | include 0000-0000-00db command to check its IP address.
Solution
l Configure a blacklist.
#
acl number 4000
rule 10 permit type 0806 ffff source-mac 0000-0000-00db
ffff-ffff-ffff
#
cpu-defend policy 1
blacklist 1 acl 4000
//Add the users with specified characteristics to
the blacklist through an ACL. The switch discards the packets from the users in
blacklist.
#
cpu-defend-policy 1
cpu-defend-policy 1 global
#
l Configure the attack source tracing action.
#
cpu-defend policy policy1
auto-defend enable
auto-defend threshold 30
undo auto-defend trace-type source-ip source-portvlan
undo auto-defend protocol 8021x dhcp icmp igmp tcp telnet ttl-expired
udp
auto-defend action deny
//Set the attack source tracing action. The switch
discards all attack packets within the default interval, 300s.
#
cpu-defend-policy policy1 global
cpu-defend-policy policy1
#
1.1.5 PE VPNs Cannot Ping Each Other
Fault Symptom
In Figure 1-7, two loopback interfaces are created on two PEs respectively. Loopback1 interfaces on the PEs are public network interfaces, with the IP addresses 1.1.1.1/32 and 1.1.1.2/32, respectively. Loopback2 interfaces are bound to the VPN instance test and have IP addresses 10.1.1.1/24 and 10.1.1.2/24, respectively. The PEs cannot exchange VPN routes and cannot ping each other.
![]()
Figure 1-5 PE VPNs cannot ping each other
Cause Analysis
When a device has two routes to the same destination, a direct route and a BGP route, the device preferentially uses the direct route to create a local VPN routing entry. The ping fails because no BGP route exists in the VPN routing table.
Troubleshooting Procedure
1. Run the display ip routing-table command on PE1 and PE2 to check the routes to the remote network segment. You can find that routes to the remote Loopback1's network segment exist in the routing table.
2. Run the display ip routing-table vpn-instance vpn-instance-name command on PE1 and PE2 to check routes in the VPN routing table. The VPN routing table has only one route 10.1.1.0/24 Direct, which is the route to loopback2 of the local device. Besides, the IP address mask has 24 bits but not 32 bits.
In this case, Loopback2's addresses of the two PEs are on the same network segment. Although each PE has received the VPN route, the PE considers that the BGP route is the same as the direct route because its Loopback2's address is on the same network segment as that of the remote Loopback2. The device preferentially uses the direct route to create a local VPN routing entry. The PEs fail to ping each other because no BGP route exists in the VPN routing table.
Solution
Run the following commands on PE1 and PE2:
1. Run the system-view command to enter the system view.
2. Run the interface loopback loopback-number command to enter the Loopback2 interface view.
3. Run the ip address ip-address { mask | mask-length } command to configure an IP address for Loopback2 and change the IP address mask length to 32 bits.
Conclusion
When two same routes are destined for one network segment, the device updates only one of them to the VPN routing table.
1.1.6 The Huawei Switch Fails Ping the Non-Huawei C3750 Switch
Fault Symptom
As shown in Figure 1-8, Switch is directly connected to C3750. They set up an OSPF neighbor relationship through VLAN 200 and advertise the routes of VLAN 100 and VLAN 300 to the remote ends. The monitor server (172.19.2.2) performs ping operations to detect whether the server (172.19.3.2) is online.
![]()
Figure 1-6 Huawei switch fails to ping the directly connected C3750
A ping failure occurs about every 18 hours, and is recovered after 0.5 hours, affecting the surveillance service.
Cause Analysis
Route aging on C3750 is abnormal, so the route 172.19.2.0 on the network segment to the monitor server is lost, causing a ping failure.
Troubleshooting Procedure
1. Check traffic statistics. The ICMP request packets from the monitor server can be correctly forwarded by Switch but Switch does not receive ICMP reply packets. The problem may occur on C3750.
2. View routing information on C3750. When the problem occurs, the route to the network segment where the monitor server is located disappears. As a result, the returned ICMP reply packets are discarded by C3750.
When the problem occurs, the following two Network LSAs exist in LSDB information on Switch, but does not exist on C3750:
Type
LinkState ID
AdvRouter Age Len
Sequence Metric
Network 172.19.5.1
172.19.1.250 1256 32
80000208 0
Network 172.19.5.1
172.19.99.10 3600 32
800026C9 0
Switch floods the LSA advertised by 172.19.99.10 to all neighbors. When receiving this LSA, C3750 deletes the LSA advertised by 172.19.1.250, causing route loss in route calculation. After 30 minutes, Switch (172.19.1.250) updates the LSAs and advertises its own Network LSAs to C3750. Then the routes on C3750 are recovered.
Pay attention to the following points:
− The OSPF protocol defines three essential elements in an LSA: Type, LinkStateID, and AdvRouter. Therefore, Switch considers that the LSAs advertised by 172.19.99.10 and 172.19.1.250 are different. C3750 may consider that the two LSAs are the same; therefore, it overwrites the LSA advertised by 172.19.1.250 with the LSA advertised by 172.19.99.10. In addition, the aging time of the LSA advertised by C3750 is 3600; therefore, C3750 ages this LSA out, causing a route loss.
− The LSA advertised by 172.19.99.10 has a DC flag.
Type
: Network
Ls id : 172.19.5.1
Adv rtr : 172.19.99.10
Ls age : 3600
Len : 32
Options : DC E
seq# : 800026c9
chksum : 0xd55
Net mask : 255.255.255.0
Attached Router 172.19.99.10
Attached Router 172.19.8.1
According to RFC 1793, when DoNotAge bit (highest bit in the Age field) is set to 1, this LSA does not need to be deleted, even if the advertiser is unavailable.
When were these LSAs deleted?
The problem occurs when all the following conditions are met:
- The LSA has existed in the LSDB for at least 3600s.
- There is no reachable route to the LSA advertiser.
Solution
l Change the OSPF neighbor types on Switch and C3750 to P2P, to avoid interference of incorrect LSAs.
l Change the IP addresses of interfaces between Switch and C3750. This can also avoid interference of incorrect LSAs.
This is my solution, how about yours? Go ahead and share it with us!