Method used to check a slow disk

0

You can check a slow disk as follows:
1. Checking the OSM alarm
Check on the OSM management interface whether there is a slow disk alarm whose ID is 5613. If the alarm exist, check whether the slow disk is isolated (the disk has completed reconstruction). If the slow disk is not isolated, refer to relevant disk replacement guides to manually replace the disk.
2. Checking the SES log
Collect the SES log of storage devices by obtaining SES_log.txt and bak files under the /OSM/log_conf_local/log/cur_debug directory. Check slow I/O records, I/O distribution, and search keyword Disk IO Delay.
--------------------------Disk IO Delay Count------2012-01-10 02:30:52--------------------
Disk IO Delay Count Threshold: [300ms] [500ms] [700ms] [1000ms]
[0][2][3LM4JYJJ00009844V79S][3, 5, 15, 1]
The above information shows that within five minutes, the disk in slot (0,2) has three I/Os of over 300 ms latency, five I/Os of over 500 ms latency, 15 I/Os of over 700 ms latency, and one I/O of over 1000 ms latency.
Longer I/O latency of a disk may result in frequent display of the disk. Refer to relevant disk replacement guides to manually replace the disk.
If you have any question, contact technical support engineers.
3. Checking the message log
Collect the message log of storage devices by obtaining >message and bak files under the /OSM/log_conf_local/log/cur_debug directory. Search keyword long time.
Jun 20 14:45:25 OceanStor kernel: [21086119188]mptscsih SLOW IO INFO: cost long time (13135), host id(0), channel id(0), scsi id (14), lun id(0), io lenth (524288), io mode(1), io lba(0x215321088)
The I/O of SCSI device scs id (14) is suspended.
Log in to the debug mode of storage devices, enter lsscsi, and obtain the drive letters corresponding to SCSI ID. Log in to the MML mode and enter dev disk enclosure ID to obtain the drive letters corresponding to slot ID.
4. Checking a slow disk
If the slow I/O record displays frequently in logs (SES log and message log) and the time when such record is displayed is close to the time when services are affected (such as video freeze), the disk may be the one that affects services and the disk is the slow disk.

Other related questions:
Method used to identify slow disks in the storage system
Method used to identify slow disks in the storage system: Slow disks refer to disks with poor performance in the storage system. Slow disks deteriorate the performance of the RAID group where the slow disks reside, or even the performance of the whole service system. To ensure stable performance of the storage system, you can perform the following steps to identify slow disks and replace them: 1. Log in to the storage device using the command line mode and enter the debug mode. For OceanStor S2600 V100R001, S5000 V100R001, S2600 R5C02, and S5000R5C02 storage, after specifying the user name and password, run debug to enter the debug mode in the CLI. For S5500T, S5600T, S5800T, S6800T, S3900, S5900, and S6900 storage systems, specify ibc_os_hs as the user name and Storage@21st as the password to enter the debug mode in the CLI. 2. Run iostat to check the disk usage, service time on each I/O, average waiting time of I/O requests, and the quantity of I/Os to be processed. Note: If values of the quantity of I/Os to be processed, average waiting time of I/O requests, service time on each I/O, and disk usage of a disk are greater than other disks, this disk is a slow disk.

Method used to check whether a disk is faulty
Method used to check whether a disk is faulty: If a disk is faulty, the red indicator on the disk is on and alarms are displayed on the ISM GUI. Check the message log by searching for the keywords hardware error.

Method used to check the copyback information about the disk
Log in to the CLI and run showdisk -e enclosure id -s slot id to view copyback information about the disk. Note: enclosure id indicates the enclosure number of the disk. slot id indicates the slot number of the disk. If copyback is being executed, the copying progress is displayed in the Copyback Progress command output. Take the disk whose enclosure ID is 0 and slot ID is 3 as an example. Run showdisk -e 0 -s 3 to check the copyback information about the disk.

Solution if a slow disk exists
Slow disk policies of the current T series products are as follows: Collect statistics on I/Os read from or written to a disk. When it takes 10s (configurable) to read I/O from or write I/O to a disk, the number of such I/Os on a disk exceeds 10 (configurable) within 30 minutes (configurable), and the interval between I/Os is longer than 3s (configurable), the system reports the slow disk. If disk rejection does not result in RAID failures, perform disk rejection. Recommended action: If the slow disk is not removed and inserted, collect S.M.A.R.T information about the disk and replace it. Send the slow disk back to Huawei R&D engineers for analysis.

Method used to isolate disks
Method used to isolate disks: 1. Locating procedure: The method is used to isolate disks impacting system services or proper running of the system. a. In most bilateral isolation cases, it is recommended that the pre-failure configuration be used. b. In the case of error code isolation and intermittent disconnection isolation, unilateral isolation and access, instead of slow disk isolation, are used. 2. Solution a. Bilateral isolation and access (simulating disk rejection and acceptance) Note: Disk rejection is mostly used for RAID recovery. In other cases, it is recommended that the pre-failure configuration be used for disk rejection. Isolation: Log in to the storage's MML mode and enter dev setdiskout enclosure ID slot ID to specify the enclosure number and slot number. Access: Log in to the storage's MML mode and enter dev setdiskin enclosure ID slot ID to specify the enclosure number and slot number. b. Unilateral isolation and access Note: It is mostly used for the error code isolation and intermittent disconnection isolation modes. Error code isolation and intermittent disconnection isolation mostly occur on unilateral isolation (that is, the disk has one link). In this case, you need to connect the disk to the system by using a link and continue to use the disk. If the disk has single link again later, replace the disk. Log in to the isolated controller. Unilateral isolation: Log in to the storage's MML mode and enter dev setdiskout enclosure ID slot ID 1. Unilateral access: Log in to the storage's MML mode and enter dev setdiskin enclosure ID slot ID 1. After unilateral isolation is performed, on the OSM page, the disk indicator becomes yellow. After unilateral access is performed, the disk indicator turns normal.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top