Solution if a slow disk exists

0

Slow disk policies of the current T series products are as follows:

Collect statistics on I/Os read from or written to a disk. When it takes 10s (configurable) to read I/O from or write I/O to a disk, the number of such I/Os on a disk exceeds 10 (configurable) within 30 minutes (configurable), and the interval between I/Os is longer than 3s (configurable), the system reports the slow disk. If disk rejection does not result in RAID failures, perform disk rejection.

Recommended action:
If the slow disk is not removed and inserted, collect S.M.A.R.T information about the disk and replace it. Send the slow disk back to Huawei R&D engineers for analysis.

Other related questions:
Method used to check a slow disk
You can check a slow disk as follows: 1. Checking the OSM alarm Check on the OSM management interface whether there is a slow disk alarm whose ID is 5613. If the alarm exist, check whether the slow disk is isolated (the disk has completed reconstruction). If the slow disk is not isolated, refer to relevant disk replacement guides to manually replace the disk. 2. Checking the SES log Collect the SES log of storage devices by obtaining SES_log.txt and bak files under the /OSM/log_conf_local/log/cur_debug directory. Check slow I/O records, I/O distribution, and search keyword Disk IO Delay. --------------------------Disk IO Delay Count------2012-01-10 02:30:52-------------------- Disk IO Delay Count Threshold: [300ms] [500ms] [700ms] [1000ms] [0][2][3LM4JYJJ00009844V79S][3, 5, 15, 1] The above information shows that within five minutes, the disk in slot (0,2) has three I/Os of over 300 ms latency, five I/Os of over 500 ms latency, 15 I/Os of over 700 ms latency, and one I/O of over 1000 ms latency. Longer I/O latency of a disk may result in frequent display of the disk. Refer to relevant disk replacement guides to manually replace the disk. If you have any question, contact technical support engineers. 3. Checking the message log Collect the message log of storage devices by obtaining >message and bak files under the /OSM/log_conf_local/log/cur_debug directory. Search keyword long time. Jun 20 14:45:25 OceanStor kernel: [21086119188]mptscsih SLOW IO INFO: cost long time (13135), host id(0), channel id(0), scsi id (14), lun id(0), io lenth (524288), io mode(1), io lba(0x215321088) The I/O of SCSI device scs id (14) is suspended. Log in to the debug mode of storage devices, enter lsscsi, and obtain the drive letters corresponding to SCSI ID. Log in to the MML mode and enter dev disk enclosure ID to obtain the drive letters corresponding to slot ID. 4. Checking a slow disk If the slow I/O record displays frequently in logs (SES log and message log) and the time when such record is displayed is close to the time when services are affected (such as video freeze), the disk may be the one that affects services and the disk is the slow disk.

Method used to identify slow disks in the storage system
Method used to identify slow disks in the storage system: Slow disks refer to disks with poor performance in the storage system. Slow disks deteriorate the performance of the RAID group where the slow disks reside, or even the performance of the whole service system. To ensure stable performance of the storage system, you can perform the following steps to identify slow disks and replace them: 1. Log in to the storage device using the command line mode and enter the debug mode. For OceanStor S2600 V100R001, S5000 V100R001, S2600 R5C02, and S5000R5C02 storage, after specifying the user name and password, run debug to enter the debug mode in the CLI. For S5500T, S5600T, S5800T, S6800T, S3900, S5900, and S6900 storage systems, specify ibc_os_hs as the user name and Storage@21st as the password to enter the debug mode in the CLI. 2. Run iostat to check the disk usage, service time on each I/O, average waiting time of I/O requests, and the quantity of I/Os to be processed. Note: If values of the quantity of I/Os to be processed, average waiting time of I/O requests, service time on each I/O, and disk usage of a disk are greater than other disks, this disk is a slow disk.

Method used to create RAID groups on the existing disks
You can create RAID groups on the existing disks as follows: 1. During RAID capacity planning, you must plan the RAID group level. When determining the RAID level, the following must be considered: reliability, performance, and disk utilization. 2. You must also plan the number of member disks in the RAID group. That is, you must meet requirements on the number of member disks at different RAID levels.

In FusionInsight,how to resolve the alarm ALM-12033 Slow Disk Fault?
http://support.huawei.com/ehedex/pages/DOC1000113260YEF0811N/04/DOC1000113260YEF0811N/04/resources/alarm/ALM-12033.html?ft=0&id=ALM-57002_2

In FusionInsight,What time does ALM-12033 Slow Disk Fault
The system runs the iostat command every second to monitor the disk I/O indicator. If there are more than 30 times that the svctm value is greater than 100 ms in 60s, the disk is faulty and the alarm is generated.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top