What to do if the CNA storage plane is unreachable? Highlighted

136 0 4 1

This post is about what to do if the CNA storage plane is unreachable. Please see the solution below.

 

Background and diagnostic


The network of the storage plane on the CNA node is disconnected.


Problem analysis


No.


Root cause


The possible causes are as follows:


1. The network service is restarted on the CNA.


2. The FC database is damaged.


3. The physical network adapter is abnormal.


4. The external link is abnormal.


Impact and risks


Describe the impact and risks of the problem on the services.


Solution description


In normal cases, the IP address of the storage plane is configured on the system interface named iscsi-X (a virtual network adapter. There may be multiple storage system interfaces and X starts from 0. Therefore, the host may have iscsi-0 and iscsi-1).


Storage system interfaces are classified into the following types:


1. The first is the OVS forwarding mode. In this mode, the iscsi-X is connected to the virtual switch and the virtual switch is connected to the bond device. No IP address is configured for the bond and the eth device that forms the bond.


2. The second type is the Linux sub-interface mode. The Vlan sub-interface is a virtual device created based on a physical NIC or a physical NIC aggregation. It has independent Layer 2 communication capabilities. The communication packets are sent and received via physical NIC or a physical NIC aggregation. In this mode, iscsi-X is a vlan device created by means of an IP link add link $bond name iscsi-X type vlan id XX; it is also connected to a specified bond device.


Step 1. Confirm the storage plane mode on the FC interface. On the host interface, click 'Configuration' - 'System Interface' - 'Connection Settings' of the storage interface to view the exchange mode.


Linux


Step 2. Log in to the host as the root account. Then, go to Step 3 to check the storage plane networking.


Step 3. View the Networking of the Storage Plane.


If using OVS forwarding mode, execute ovs-vsctl show and the returned results refer to Figure 1. If you are using the Linux sub-interface mode, execute cat /proc/net/vlan/config and the returned result is as shown in Figure 2.


Pay attention to the following two points:


1. iscsi-X exists, using the vlan used in the storage plane;


2. the uplink used by iscsi-X meets the expectations. If the command does not see the iscsi-X device, the FC database may be abnormal and needs to be checked first.


ovs-vsctl show


Figure 1. Command returns results


 153031gt6s04zx5w5cx4w5.png?图片.png

cat /proc/net/vlan/config


Figure 2. Command returns results

153040olne4paa5bdmgm44.png?图片.png


Step 4. Run the following command route -n to check whether the storage plane route is configured on iscsi-X. If the storage plane has a route configured on the bond or eth device, it is may caused by the network service restart operation. You need to execute ifconfig $bond or eth 0 in the BMC remote interface to clear the IP of the bond and eth devices and make sure the storage plane route is only configured on iscsi-X.


route-n

153051noilcssbibsslssf.png?图片.png


Step 5. If the network adapter on the storage plane is connected to the bond in step 2, run the following command to check the bond mode. If the active/standby mode is used, check the current active network adapter. In the following figure, active-backup indicates the active/standby mode, and Currently Active Slave: eth0 indicates that the active network adapter is eth0. In addition, check whether the rate of each bond network adapter is normal. If the speed is not as expected, for example, if the speed is 100Mbps, it indicates that the network adapter is abnormal, the network cable is not properly connected, or the switch port is abnormal. In this case, you can run the ifconfig $bond or eth name down command to shut down the network adapter, making the traffic is sent and received only on the network adapter with a normal rate.

cat /proc/net/bonding/$bond name

153101z4njj50zw95uzn2z.png?图片.png


In active/standby mode, packets are sent and received through the active network adapter. Packets received from the standby network adapter are discarded. So packet loss on the standby network adapter is normal. In other modes, packets are sent and received on all Ethernet devices composing of the bond based on different hash rules, if packet loss occurs on any network adapter, the network may be abnormal.


Step 6. Run the following command to check whether the number of packets sent and received by the network adapter increases (In bond active/standby mode, check the active network adapter only. Otherwise, all network adapters used by bond are needed to check), and whether the number of dropped statistics increases. If the value does not increase, check the switch configuration, analyze why the switch does not send packets to the network adapter.

ifconfig $eth NA

153111f64j3po4kv8o4bwp.png?图片.png


 If the value of dropped increases, in the case of ensuring that it is not a standby NIC of the active/standby mode bond, run the ethtool -S $eth name command to check which item is lost. The common case is that rx_missed_errors and rx_crc_errors are added.

153119wggiggqmiicbghic.png?图片.png


If the rx_missed_errors is increased, indicating the instantaneous traffic is large, the physical network card processing capability is up to the upper limit and leads to packet loss, run the sar-n DEV 1 command or view the historical dump log to check the NIC traffic. You can see whether a virtual machine back-end NIC is also received at the same time, wherein the traffic may be brought by the virtual machine attack; if the last column rxmcst/s is large (such as hundreds of thousands), it may be the traffic caused by the network loop. Faced with this situation, you need to analyze the source of traffic, and reduce it on the physical switch or firewall.

153128cgauxm3mc3mqi7ff.png?图片.png


If rx_crc_errors is increased, the NIC or network cable or switch port may be abnormal. In order to avoid the problem, you can try to use the ifconfig command to down the NIC and make the bond use other normal NICs.


Step 7. If the virtual switch network is normal, there is no packet loss in the packet receiving and sending as well as no error is reported by NIC driver, then it is most likely that the external switch causes the link to be abnormal. It is recommended to check the regularities of the host. For example, whether the network exception occurs on the host connected to the same switch? If the NICs that form the bond are connected to the switch or the switch board, you can try to down one NIC to make the traffic go to other NICs. If the problem no longer occurs, you can confirm that the external link of one NIC is abnormal. The more direct way is to upload the tcpdump tool to CNA and capture the packet on the physical network card. If no abnormality is found in steps 1~5, and tcpdump does not capture the externally sent packet, it needs to contact the colleagues in charge of switch for analysis. Packet Capture Command Refer to:tcpdump -i ethXX host $external IP and $local IP -n.

                           

Add an example


The storage plane of the three servers that are newly expanded in a certain site is unreachable. The vlan segment of 101is configured on the storage plane, and the uplink is the eth3 physical NIC. When pinging the external normal storage plane IP address on the abnormal host and capturing the packet on the physical NIC, you can see the arp broadcast packet sent by the host side carrying with vlanfrom the packet content.


153138t4mmspl7pf7p6ehl.png?图片.png


153152z75il6l00zwilcz0.png?图片.png


The received arp reply packet does not contain a VL.


153200lyxm7oz0wuoxk7ag.png?图片.png


153208tja2za19dk11c4dj.png?图片.png


The final confirmation is that there is a problem with the VLAN ID of the switch. The packet VLAN sent to the server is stripped off, and finally the configuration is restored by modifying the switch.


Summary and suggestions


No.


  • x
  • convention:

Login and enjoy all the member benefits

Login and enjoy all the member benefits

Login