How to Analyze Bit Errors on Storage Ports

Latest reply: Aug 9, 2019 09:05:14 358 1 15 8

Symptom:

The storage device reports that the bit error rate of a port is too high and the port status changes to fault.

FChostport(controllerenclosureCTE0, controllerA, portIDH0)hastoomanybiterrors.

142227k72mithb6zht7yyq.png?image.png

The analysis procedure is as follows:

Analysis storage log

1. Export storage logs or running data. Setting -> Export data -> Running data -> Download:

142239zw3m2zw2fxuxm58u.png?image.png

 

2. In the running data, you can search for the corresponding port, for example, CTE0, controllerA, portID H0. In this example, you can view the TX and RX power of the optical module of the port. The problem is determined according to the value of the transmit and receive power.

142247t577nhzmv6p5y03f.png?image.png

1)  If the transmit power (Tx power) is lower than 300.0uw (Note that some versions do not have a decimal point, that is less than 3000uw), the optical module is faulty. Replace the optical module.

2)  If the receive power (Rx power) is lower than 300.0uw (Note that some versions do not have a decimal point, that is less than 3000uw), it indicates that the optical fiber or the peer optical module is faulty. If you do not want to check the switch, you can replace it directly. If you need to check the switch logs, then check the switch logs.

3)  If the transmit and receive power is normal, it indicates that the link between the port and the switch is normal, bit errors are transmitted from other ports which communicate with this port on the switch. (Note that bit error can be transmitted. Many colleagues have misunderstood, think port report bit error means that port to switch must have problem, that is wrong.) check other ports on the switch. For details, see the analysis of SNS switches.

 

 

 

 

Analyzing SNS Switches

1. Switch logs can be collected by running the supportshow command on the switch. The standard collection is supportsave. This way requires a FTP server. Search for switch logs.

Switchshow : You can view all the ports and wwns connected to the switch.

Porterrshow  : You can view the bit errors of all ports.

142301d5x9xx51vp53zvuh.png?image.png

142307phn7hmc9t922qrqm.png?image.png

In this case, you need to check which ports have bit errors. Check whether the number of CRC err and CRC g_eof,enc out,pcs err of the port increases. The meaning of each bit error, refers to the following documents. You can also check the bit errors on the Google.

Frames tx/rx counters represent the number of frames transmitted

Enc_in: 8bit/10bit encoding errors inside frame. Words inside of frames are encoded,

if this encoding is corrupted or an error is detected enc_in is generated.

Minimum compliance with the link bit error rate specification on a link continuously

receiving frames would cause approximately one error every 20 minutes. Reinitialisation

/ reboots of the associated Nx-port can also cause these errors.

Crc_err: crc errors – a mathematical formula generates counters at sending port,

receiving port uses the same formula to check and compare.

Statistically, crc_err and enc_out errors together imply GBIC/SFP problem.

Also see “bad_eof” below.

Too_long: FC frames are 2148 bytes maximum. Frames that were longer than the FC

maximum (SOF+header+2112bytes+CRC+EOF). If an eof is corrupted or data

generation is incorrect a too_long error will be reported.

Too_short: The too_short error statstics counter is incremented whenever a frame,

bounded by an SOF and EOF, is received and the number of words between the SOF

and EOF is less than 7 words (6 words header plus 1 word CRC), i.e. 38 bytes (not

48) including the SOF and EOF.

This could be caused by the transmitter, or an unreliable link.

Bad_eof: After a loss of synchronization error, continuous-mode alignment allows

the receiver to re-establish word alignment at any point in the incoming bit stream

while the receiver is operational. If such a re-alignment occurs, detection of the

resulting error condition is dependant upon higher level functions (eg: invalid CRC,

missing EOF)

Enc_out: 8bit/10bit encoding errors occurred in words (ordered sets) outside of the

FC frame. Words outside of frames are encoded, if this encoding is corrupted or an

error is detected enc_out is generated. It indicates a problem if it increments faster

than the link-bit error rate allows, approximately once every 20 minutes for 1 Gbit/s.

Statisically, enc_out errors on their own imply a cable/connector problem. Enc_out

errors and crc_err together imply GBIC/SFP problem.

Such errors are also expected every time a user brings a port down and up (i.e. reboot

host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable

etc).Such errors will also be generated on a link which has a 1Gbit/s port connected

to a 2Gbit/s port when autonegotiation is turned off.

Disc c3: Discard class 3 errors could be generated by switch when devices send

frames without FLOGIing first or with an invalid destination. This error is just

reporting that a discard occurred.

Link-fail: If a port remains in the LR Receive State for a period of time greater than a

timeout period (R_A_TOV), a link reset protocol timeout shall be detected which

results in a link failure condition (enter the NOS transmit state). The link failure also

indicates that loss of signal or loss of sync lasting longer than the R_ATOV value was

detected while not in the offline state.

Loss sync: Synchronisation failures on either bit or transmission-word boundaries are

not separately identifiable and cause loss-of-synchronisation errors.

Such errors are also expected every time a user brings a port down and up (i.e. reboot

host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable

etc).

Loss sig: Occurs when a signal is transmitted but none is being received on the same

port.

Such errors are also expected every time a user brings a port down and up (i.e. reboot

host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable

etc).

Frjt:




 

2. If bit errors occur on some ports, you can infer that the ports may transmit bit errors to the storage port. If you want to check whether these ports are communicating with the storage port, you can search for the zone configuration.

Effective configuration:

You can view the current zone configuration of the switch. Check whether the storage port is in the same zone as the bit error port by wwn.

142727zwlc9ocfxc786o9p.png?image.png

3. Find out the problem of the port where bit errors occur. You can search the log for the problem.

port 2 (bit error port number):

Check the optical module of the bit error port. The rules are similar to those of the optical module in the storage system.

1) If the transmit power (Tx power) is lower than 300.0uw, the optical module is faulty. Replace the optical module first.

2) If the receive power (Rx power) is lower than 300.0uw, it indicates that the optical fiber or the peer optical module is faulty. In this case, replace the cable and peer optical module.

 142734sl3y38ee484x9ey3.png?image.png


PS:

1.       There may be other devices, such as the ODF, between the switch and the device. In this case, the number of roads to be checked increases, and the ODF may also has problem.

2.       The optical power 300uw is only an empirical value. Less than 300uw does not mean that there must be a problem. To combine the error, if there is no error, it can also be used.

3.       If the above method does not eliminate the problem, the exchange test is also a more effective method.


  • x
  • convention:

Werido
Created Aug 9, 2019 09:05:14 Helpful(0) Helpful(0)

Analyze Bit Errors on Storage PortsHow to Analyze Bit Errors on Storage Ports-3029665-1
  • x
  • convention:

Comment

You need to log in to reply to the post Login | Register

Notice Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " Privacy."
If the attachment button is not available, update the Adobe Flash Player to the latest version!
Login and enjoy all the member benefits

Login and enjoy all the member benefits

Login