Ping failure troubleshooting (2)

Latest reply: Dec 5, 2016 01:37:06 6654 1 0 2

This post is about ping failure troubleshooting (2). More details to be found below.


1.1. Troubleshooting Cases

1.1.1 A Ping Fails Because the ICMP Packet Contains an Incorrect Checksum

Fault Symptom

A switch functions as a gateway to connect to terminals such as access control system and PCs. The switch fails to ping a terminal.

Cause Analysis

The ICMP Echo Reply packets returned by the peer device carry incorrect checksum. Protocol check fails, causing a ping failure.

Troubleshooting Procedure

1.         Method 1:

When a switch correctly learns ARP entries, collect traffic statistics to check whether ICMP Echo Request and Reply packets are sent and received by the switch. Alternatively, you can capture packets:

Figure 1-1 Packet capturing

20161202142624517001.png

 

The captured packet information shows that the ICMP Reply packets have an incorrect checksum. incorrect in Figure 1-3 indicates an incorrect checksum.

2.         Method 2:

Run the display icmp statistics command before and after the ping operation to view the bad checksum field. Check whether the number of checksum error packets at the ICMP protocol layer keeps increasing.

<HUAWEI> display icmp statistics
  Input: bad formats         0          bad checksum           3   
         echo                8          destination unreachable 0   
         source quench       0          redirects               0   
         echo reply          0          parameter problem       0   
         timestamp request   0          information request     0   
         mask requests       0          mask replies            0   
         time exceeded       0          timestamp reply         0         
         Mping request       0          Mping reply             0   
  Output:echo                0          destination unreachable 0   
         source quench       0          redirects               0   
         echo reply          8          parameter problem       0   
         timestamp request   0          information reply       0   
         mask requests       0          mask replies            0   
         time exceeded       0          timestamp reply         0
         Mping request       0          Mping reply             0

The preceding information reflects an increasing number of checksum error packets.

Solution

Check whether the ICMP packets returned by the protocol stack on the peer device have a correct format.

1.1.2 Directly Connected Devices Cannot Ping Each Other Because of an Incorrect Static ARP Entry

Fault Symptom

The customer replaces a device on the live network with switch A. Figure 1-4 shows the new network. Switch A and switch B cannot ping each other, and the OSPF neighbor status on switch A is Exchange. After switch A is replaced by the original device, the fault is rectified.

Figure 1-2 Directly connected devices cannot ping each other

20161202142625716002.png

 

Cause Analysis

1.         The original device can ping switch B, indicating that the link between the two devices functions properly. Switch A and switch B are directly connected, so the fault is not caused by routing problems. The fault may be caused by errors in ARP learning.

2.         Run the display arp all command on switch A to check whether switch A has learned the ARP entry of switch B.

<SwitchA> display arp all
IP ADDRESS      MAC ADDRESS  EXPIRE(M) TYPE INTERFACE      VPN-INSTANCE
                                       VLAN/CEVLAN
------------------------------------------------------------------------------
1.1.1.1        0025-9e80-2494         I -  Vlanif20
1.1.1.2         0025-9e80-248e  18     D-0  GE1/0/1
                                       33
------------------------------------------------------------------------------
Total:2        Dynamic:1       Static:0    Interface:1

The preceding information shows that switch A has learned the ARP entry of switch B.

3.         Run the display arp all command on switch B to check whether switch B has learned the ARP entry of switch A.

<SwitchB> display arp all
IP ADDRESS      MAC ADDRESS  EXPIRE(M) TYPE INTERFACE      VPN-INSTANCE
                                       VLAN/CEVLAN
------------------------------------------------------------------------------
1.1.1.2         0025-9e80-248e         I -  Vlanif20
1.1.1.1        0016-ecb9-0eb2         S--  GE1/0/1
                                       33
------------------------------------------------------------------------------
Total:2         Dynamic:0       Static:1    Interface:1

In the ARP table, IP address 1.1.1.1 maps MAC address 0016-ecb9-0eb2. The ARP entry type is S, indicating a static ARP entry. In the ARP tables of the two switches, IP address 1.1.1.1 maps different MAC addresses.

The static ARP entry (IP + MAC + port number) was configured on switch B before the network adjustment, and was not updated after the network adjustment; therefore, switch A cannot ping switch B.

Troubleshooting Procedure

1.         Run the system-view command on switch B to enter the system view.

2.         Run the undo arp static ip-address command to delete the incorrect static ARP entry.

20161202142626152003.jpg

After the static ARP entry is deleted, switch A can ping switch B. A new static ARP entry needs to be configured to prevent ARP attacks.

3.         Run the arp static ip-address mac-address vid vlan-id interface interface-type interface-number command to configure the correct static ARP entry.

After the preceding configurations, switch A can successfully ping switch B. Run the display ospf peer command to check the status of the OSPF neighbor. The OSPF neighbor is in Full state.

<SwitchA> display ospf peer
         OSPF Process 1 with Router ID 11.11.11.105                             
                 Neighbors                                                       
                                                                                
 Area 0.0.0.0 interface 1.1.1.1(Vlanif33)'s neighbors                           
 Router ID: 2.1.1.1.168.10.2     Address: 1.1.1.2                                   
   State: Full  Mode:Nbr is  Master  Priority: 1                                
   DR: 1.1.1.2  BDR: 2.1.1.1  MTU: 0                                            
   Dead timer due in 34  sec                                                    
   Retrans timer interval: 8                                                    
   Neighbor is up for 00:28:17                                                  
   Authentication Sequence: [ 0 ]                                              

Summary

If a static ARP entry is configured on a device, modify the ARP entry after the MAC address changes. If switch B is a non-Huawei device and you cannot log in to switch B to check the configuration, ping switch B from switch A and configure the mirroring function to analyze packets transmitted between switch A and switch B. Check whether the destination MAC addresses of the packets are correct.

1.1.3 A Switch Can Be Pinged But Cannot Be Remotely Accessed

Fault Symptom

In Figure 1-5, switch C can ping VLANIF 20 of switch A, but switch C cannot access switch A using Telnet.

Figure 1-3 A switch can be pinged but cannot be remotely accessed

20161202142626127004.png

 

Cause Analysis

1.         The switch supports the fast ICMP reply function. This function enables the switch to quickly respond to the ICMP echo request packet destined for its own IP address.

This problem may be caused by the fast ICMP reply function on switch A. If fast ICMP reply is enabled on switch A, switch A can quickly respond to ICMP request packets even if switch A does not have a route destined for 2.1.1.1. Switch C can successfully ping switch A, indicating that the link between switch C and switch A is normal, but the route may be abnormal. Therefore, you need to check whether there is a reachable route from switch C to switch A.

2.         Run the tracert 1.1.1.1 command on switch C to check routes from switch C to switch A.

<SwitchC> tracert 1.1.1.1
traceroute to  1.1.1.1(1.1.1.1), max hops: 30 ,packet length: 40
 1 2.1.1.2 10 ms  1 ms  1 ms
 2  *  *  *

The preceding information shows that there is a reachable route from switch C to switch B, but no reachable route from switch C to switch A. The possible cause is that the route to 2.1.1.1 is not configured on switch A or is configured incorrectly.

3.         Run the telnet 2.1.1.2 command on switch C to log in to switch B, and run the telnet 1.1.1.1 command on switch B to log in to switch A. The Telnet operations are successful, indicating that the Telnet configuration on switch A is correct.

4.         Run the display ip routing-table 2.1.1.1 command on switch A to check the routing table. In the routing table, the longest match entry corresponding to destination IP address 2.1.1.1 is empty. Run the undo icmp-reply fast command on switch A to disable the fast ICMP reply function. Switch C fails to ping switch A.

In a conclusion, switch C can ping switch A because the fast ICMP reply function is enabled on switch A. Switch C fails to ping switch A because switch A does not have a route to 2.1.1.1.

Troubleshooting Procedure

1.         Run the system-view command on switch C to enter the system view.

2.         Run the ip route-static 2.1.1.0 255.255.255.0 1.1.1.2 command to configure a static route to 1.1.1.2.

Then switch C can access switch A using Telnet.

1.1.4 A Switch Undergoes an ARP Attack and Cannot Be Pinged

Fault Symptom

In Figure 1-6, Switch functions as a gateway, Switch_1 (modular switch) is frequently out of management, and users on Switch_1 are frequently disconnected. There is a delay when Switch_1 pings the Switch or the ping operation fails. Services on Switch_2 are normal, and Switch_2 can successfully ping the gateway.

Figure 1-4 A switch undergoes an ARP attack and cannot be pinged

20161202142627041005.png

 

Cause Analysis

Switch_1 receives ARP packets with a fixed source MAC address. User devices cannot send or receive ARP packets.

Troubleshooting Procedure

Perform the following operations on Switch_1:

1.         Check whether the CPU usage is high.

<Switch_1> display cpu-usage
CPU Usage Stat. Cycle: 10 (Second)
CPU Usage         : 82% Max: 99%
CPU Usage Stat. Time : 2010-12-18  15:35:56
CPU utilization for five seconds: 68%: one minute: 60%: five minutes: 55%.

The CPU usage reaches 82%.

2.         View temporary ARP entries to check whether ARP learning is normal.

<Switch_1> display arp
IP ADDRESS  MAC ADDRESS EXPIRE(M) TYPE VPN-INSTANCE   INTERFACE
VLAN/CEVLAN
------------------------------------------------------------------------------------------------------
10.137.222.139  00e0-fc01-4422            I -         Eth0/0/0
10.137.222.1    0025-9e36-e8c1  20        D-0         Eth0/0/0
10.137.222.100  0025-9e80-b278  6         D-0         Eth0/0/0
10.137.222.99   00e0-4c77-b0e1  9         D-0         Eth0/0/0
10.137.222.173  000f-3d80-cba4  18        D-0         Eth0/0/0
10.137.222.34   0025-9e36-e8c1  1         D-0         Eth0/0/0
10.137.222.172  0016-ec71-ea8c  7         D-0         Eth0/0/0
10.137.222.35   0025-9e36-e8c1  18        D-0         Eth0/0/0
10.137.222.179  0014-2ae2-3128  20        D-0         Eth0/0/0
10.137.222.38   0025-9e36-e8c1  17        D-0         Eth0/0/0
10.137.222.175  0014-2261-2b22  1         D-0         Eth0/0/0
50.1.1.3        Incomplete      1         D-0         GE5/0/0
500/-
50.1.1.2        Incomplete      1         D-0         GE5/0/0
500/-
6.1.1.2         00e0-fc01-4422            I -         Vlanif6
10.0.0.139      00e0-fc01-4422            I -         Vlanif10
192.0.0.4       00e0-fc01-4422            I -         Vlanif192
20.1.1.1        00e0-fc01-4422            I -         Vlanif200
192.168.2.2     00e0-fc01-4422            I -         Vlanif100
------------------------------------------------------------------------------------------------------
Total:16        Dynamic:10      Static:0    Interface:6

The MAC ADDRESS fields of two ARP entries are Incomplete, indicating temporary entries. Some ARP entries cannot be learned.

3.         Check whether the switch is suffering an ARP attack.

a.         View statistics about ARP request packets sent to the CPU.

<Switch_1>display cpu-defend arp-request statistics all
Statistics on mainboard:
------------------------------------------------------------------------------------------------------------------
Packet Type         Pass(Bytes)       Drop(Bytes)   Pass(Packets)     Drop(Packets)
-----------------------------------------------------------------------------------------------------------------
arp-request            67908288            0         1061067               0
------------------------------------------------------------------------------------------------------------------
Statistics on slot 4:
------------------------------------------------------------------------------------------------------------------
Packet Type         Pass(Bytes)       Drop(Bytes)   Pass(Packets)     Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request            80928            44380928          2301         693450
------------------------------------------------------------------------------------------------------------------
Statistics on slot 5:
------------------------------------------------------------------------------------------------------------------
Packet Type         Pass(Bytes)       Drop(Bytes)   Pass(Packets)     Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request                 N/A          N/A               0               0
------------------------------------------------------------------------------------------------------------------
Statistics on slot 6:
------------------------------------------------------------------------------------------------------------------
Packet Type         Pass(Bytes)       Drop(Bytes)   Pass(Packets)     Drop(Packets)
------------------------------------------------------------------------------------------------------------------
arp-request                 N/A          N/A               0               0
------------------------------------------------------------------------------------------------------------------

There are a large number of ARP request packets on the board in slot 4.

b.         Configure attack source tracing to identify the attack source.

<Switch_1>system-view
[Switch_1]cpu-defend policy policy1
[Switch_1-cpu-defend-policy-policy1]auto-defend enable
[Switch_1-cpu-defend-policy-policy1]auto-defend attack-packet sample 5  
//One packet is sampled out of five sent packets. A small sampling rate will consume many CPU resources.
[Switch_1-cpu-defend-policy-policy1]auto-defend threshold 30  
//The packets of which the rate reaches 30 pps are considered attack packets. If there are many attack sources, reduce this value.
[Switch_1-cpu-defend-policy-policy1]undo auto-defend trace-type source-ip source-portvlan  
//Identify the attack source based on source MAC address.
[Switch_1-cpu-defend-policy-policy1]undo auto-defend protocol 8021x dhcp icmp igmp tcp telnet ttl-expired udp  
//Identify the attack source of the ARP attack.
[Switch_1-cpu-defend-policy-policy1]quit
[Switch_1]cpu-defend-policy policy1
[Switch_1]cpu-defend-policy policy1 global

c.         View attack source information.

[Switch_1]display auto-defend attack-source
Attack Source User Table (MPU):
------------------------------------------------------------------------------------------------
MacAddress       InterfaceName      Vlan:Outer/Inner      TOTAL
------------------------------------------------------------------------------------------------
0000-0000-00db   GigabitEthernet2/0/22         193           416
------------------------------------------------------------------------------------------------

The MAC address of attack source is 0000-0000-00db, which is connected to GigabitEthernet2/0/22.

If the MAC address has a matching ARP entry, run the display arp | include 0000-0000-00db command to check its IP address.

Solution

l   Configure a blacklist.

#
acl number 4000
 rule 10 permit type 0806 ffff source-mac 0000-0000-00db ffff-ffff-ffff
#
cpu-defend policy 1
 blacklist 1 acl 4000  
//Add the users with specified characteristics to the blacklist through an ACL. The switch discards the packets from the users in blacklist.
#
cpu-defend-policy 1 
cpu-defend-policy 1 global
#

l   Configure the attack source tracing action.


cpu-defend policy policy1 
 auto-defend enable 
 auto-defend threshold 30 
 undo auto-defend trace-type source-ip source-portvlan 
 undo auto-defend protocol 8021x dhcp icmp igmp tcp telnet ttl-expired udp
 auto-defend action deny  
//Set the attack source tracing action. The switch discards all attack packets within the default interval, 300s.
#
 cpu-defend-policy policy1 global 
 cpu-defend-policy policy1 
#

1.1.5 PE VPNs Cannot Ping Each Other

Fault Symptom

In Figure 1-7, two loopback interfaces are created on two PEs respectively. Loopback1 interfaces on the PEs are public network interfaces, with the IP addresses 1.1.1.1/32 and 1.1.1.2/32, respectively. Loopback2 interfaces are bound to the VPN instance test and have IP addresses 10.1.1.1/24 and 10.1.1.2/24, respectively. The PEs cannot exchange VPN routes and cannot ping each other.

Figure 1-5 PE VPNs cannot ping each other

20161202142628248006.png

 

Cause Analysis

When a device has two routes to the same destination, a direct route and a BGP route, the device preferentially uses the direct route to create a local VPN routing entry. The ping fails because no BGP route exists in the VPN routing table.

Troubleshooting Procedure

1.         Run the display ip routing-table command on PE1 and PE2 to check the routes to the remote network segment. You can find that routes to the remote Loopback1's network segment exist in the routing table.

2.         Run the display ip routing-table vpn-instance vpn-instance-name command on PE1 and PE2 to check routes in the VPN routing table. The VPN routing table has only one route 10.1.1.0/24 Direct, which is the route to loopback2 of the local device. Besides, the IP address mask has 24 bits but not 32 bits.

In this case, Loopback2's addresses of the two PEs are on the same network segment. Although each PE has received the VPN route, the PE considers that the BGP route is the same as the direct route because its Loopback2's address is on the same network segment as that of the remote Loopback2. The device preferentially uses the direct route to create a local VPN routing entry. The PEs fail to ping each other because no BGP route exists in the VPN routing table.

Solution

Run the following commands on PE1 and PE2:

1.         Run the system-view command to enter the system view.

2.         Run the interface loopback loopback-number command to enter the Loopback2 interface view.

3.         Run the ip address ip-address { mask | mask-length } command to configure an IP address for Loopback2 and change the IP address mask length to 32 bits.

Conclusion

When two same routes are destined for one network segment, the device updates only one of them to the VPN routing table.

1.1.6 The Huawei Switch Fails Ping the Non-Huawei C3750 Switch

Fault Symptom

As shown in Figure 1-8, Switch is directly connected to C3750. They set up an OSPF neighbor relationship through VLAN 200 and advertise the routes of VLAN 100 and VLAN 300 to the remote ends. The monitor server (172.19.2.2) performs ping operations to detect whether the server (172.19.3.2) is online.

Figure 1-6 Huawei switch fails to ping the directly connected C3750

20161202142629326007.png

 

A ping failure occurs about every 18 hours, and is recovered after 0.5 hours, affecting the surveillance service.

Cause Analysis

Route aging on C3750 is abnormal, so the route 172.19.2.0 on the network segment to the monitor server is lost, causing a ping failure.

Troubleshooting Procedure

1.         Check traffic statistics. The ICMP request packets from the monitor server can be correctly forwarded by Switch but Switch does not receive ICMP reply packets. The problem may occur on C3750.

2.         View routing information on C3750. When the problem occurs, the route to the network segment where the monitor server is located disappears. As a result, the returned ICMP reply packets are discarded by C3750.

When the problem occurs, the following two Network LSAs exist in LSDB information on Switch, but does not exist on C3750:

Type      LinkState ID    AdvRouter        Age  Len   Sequence   Metric
Network   172.19.5.1      172.19.1.250      1256  32    80000208    0
Network   172.19.5.1      172.19.99.10      3600  32    800026C9    0

Switch floods the LSA advertised by 172.19.99.10 to all neighbors. When receiving this LSA, C3750 deletes the LSA advertised by 172.19.1.250, causing route loss in route calculation. After 30 minutes, Switch (172.19.1.250) updates the LSAs and advertises its own Network LSAs to C3750. Then the routes on C3750 are recovered.

Pay attention to the following points:

           The OSPF protocol defines three essential elements in an LSA: Type, LinkStateID, and AdvRouter. Therefore, Switch considers that the LSAs advertised by 172.19.99.10 and 172.19.1.250 are different. C3750 may consider that the two LSAs are the same; therefore, it overwrites the LSA advertised by 172.19.1.250 with the LSA advertised by 172.19.99.10. In addition, the aging time of the LSA advertised by C3750 is 3600; therefore, C3750 ages this LSA out, causing a route loss.

           The LSA advertised by 172.19.99.10 has a DC flag.

Type      : Network
Ls id     : 172.19.5.1
Adv rtr   : 172.19.99.10
Ls age    : 3600
Len       : 32
Options   :  DC  E
seq#      : 800026c9
chksum    : 0xd55
Net mask  : 255.255.255.0
Attached Router    172.19.99.10
Attached Router    172.19.8.1

According to RFC 1793, when DoNotAge bit (highest bit in the Age field) is set to 1, this LSA does not need to be deleted, even if the advertiser is unavailable.

When were these LSAs deleted?

The problem occurs when all the following conditions are met:

- The LSA has existed in the LSDB for at least 3600s.

- There is no reachable route to the LSA advertiser.

Solution

l   Change the OSPF neighbor types on Switch and C3750 to P2P, to avoid interference of incorrect LSAs.

l   Change the IP addresses of interfaces between Switch and C3750. This can also avoid interference of incorrect LSAs.


 

This article contains more resources

You need to log in to download or view. No account?Register

x
  • x
  • convention:

user_2790689
Created Dec 5, 2016 01:37:06 Helpful(0) Helpful(0)

thank you for sharing.
  • x
  • convention:

Comment

Reply
You need to log in to reply to the post Login | Register

Notice Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " Privacy."
If the attachment button is not available, update the Adobe Flash Player to the latest version!
Login and enjoy all the member benefits

Login and enjoy all the member benefits

Login