Method used to identify slow disks in the storage system

0

Method used to identify slow disks in the storage system:
Slow disks refer to disks with poor performance in the storage system. Slow disks deteriorate the performance of the RAID group where the slow disks reside, or even the performance of the whole service system. To ensure stable performance of the storage system, you can perform the following steps to identify slow disks and replace them:
1. Log in to the storage device using the command line mode and enter the debug mode.
For OceanStor S2600 V100R001, S5000 V100R001, S2600 R5C02, and S5000R5C02 storage, after specifying the user name and password, run debug to enter the debug mode in the CLI.
For S5500T, S5600T, S5800T, S6800T, S3900, S5900, and S6900 storage systems, specify ibc_os_hs as the user name and Storage@21st as the password to enter the debug mode in the CLI.
2. Run iostat to check the disk usage, service time on each I/O, average waiting time of I/O requests, and the quantity of I/Os to be processed.
Note:
If values of the quantity of I/Os to be processed, average waiting time of I/O requests, service time on each I/O, and disk usage of a disk are greater than other disks, this disk is a slow disk.

Other related questions:
Method used to check a slow disk
You can check a slow disk as follows: 1. Checking the OSM alarm Check on the OSM management interface whether there is a slow disk alarm whose ID is 5613. If the alarm exist, check whether the slow disk is isolated (the disk has completed reconstruction). If the slow disk is not isolated, refer to relevant disk replacement guides to manually replace the disk. 2. Checking the SES log Collect the SES log of storage devices by obtaining SES_log.txt and bak files under the /OSM/log_conf_local/log/cur_debug directory. Check slow I/O records, I/O distribution, and search keyword Disk IO Delay. --------------------------Disk IO Delay Count------2012-01-10 02:30:52-------------------- Disk IO Delay Count Threshold: [300ms] [500ms] [700ms] [1000ms] [0][2][3LM4JYJJ00009844V79S][3, 5, 15, 1] The above information shows that within five minutes, the disk in slot (0,2) has three I/Os of over 300 ms latency, five I/Os of over 500 ms latency, 15 I/Os of over 700 ms latency, and one I/O of over 1000 ms latency. Longer I/O latency of a disk may result in frequent display of the disk. Refer to relevant disk replacement guides to manually replace the disk. If you have any question, contact technical support engineers. 3. Checking the message log Collect the message log of storage devices by obtaining >message and bak files under the /OSM/log_conf_local/log/cur_debug directory. Search keyword long time. Jun 20 14:45:25 OceanStor kernel: [21086119188]mptscsih SLOW IO INFO: cost long time (13135), host id(0), channel id(0), scsi id (14), lun id(0), io lenth (524288), io mode(1), io lba(0x215321088) The I/O of SCSI device scs id (14) is suspended. Log in to the debug mode of storage devices, enter lsscsi, and obtain the drive letters corresponding to SCSI ID. Log in to the MML mode and enter dev disk enclosure ID to obtain the drive letters corresponding to slot ID. 4. Checking a slow disk If the slow I/O record displays frequently in logs (SES log and message log) and the time when such record is displayed is close to the time when services are affected (such as video freeze), the disk may be the one that affects services and the disk is the slow disk.

Whether disks of S2600 storage systems can be used by S5000 storage systems
Whether disks of S2600 storage systems can be used by S5000 storage systems: FAQ-Can S2600 Disks Be Used by S5000

Method used to migrate data on one disk of a RAID group to other disks in the storage system
You can migrate data on one disk of a RAID group to other disks in the storage system as follows: When the disk status is Normal and you must migrate data on the disk to a hot spare disk, manually change the disk. Run the CLI command startdiskswap to migrate data on the faulty disk to a hot spare disk in the following format: startdiskswap -se faulty disk enclosure ID -ss faulty disk slot ID -te target disk enclosure ID -ts target disk slot For example, migrate data on the disk (1, 23) to the hot spare disk (1, 20) by running the following command: startdiskswap -se 1 -ss 23 -te 1 -ts 20

Method used to identify the cause of a damaged file system in the Linux host
You can locate the cause of a damaged file system as follows: 1. Issue Description How can I identify the cause of a damaged file system in the Linux host? 2. Solution Fault location and rectification If a damaged file system is caused by the operating system, rectify the problem on the operating system side. If a damaged file system is caused by storage disks, rectify the problem on the storage side. Other causes lead to a damaged file system. Solution: a. If the damaged file system is located in interactive personality TV (IPTV), the following information is displayed. Enter the storage directory. Failures such as input out error are displayed or a file system fails to be mounted (you must ensure proper mapping of LUNs and disks added to hosts by the storage system). b. The damaged file system is caused by an operating system failure. Check host logs by going to the /var/log directory and searching for compressed log packages about message (search the latest message log first). Search keyword err in host logs to check whether the following information is displayed (XFS is used as an example). Feb 18 16:19:01 WX-BY-HMU2 kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4534 of file fs/xfs/xfs_bmap.c. Caller 0xffffffff882c4f9c If a internal error is found in the host logs, the error is caused by an operating system failure. Solution: Consult the relevant operating system personnel for troubleshooting. You can refer to maintenance documentation. 3. The damage file system is caused by failures on the storage side. Check host logs by going to the /var/log directory and searching for compressed log packages about message (search the latest message log first). Search keyword err in host logs to check whether the following information is displayed (XFS is used as an example). Dec 7 15:03:00 gdby2-hms01 kernel: end_request: I/O error, dev sdc, sector 2093665280 If an I/O error is found in host logs, the error is caused by a disk fault in the storage system or a link fault between the host and the storage. Solution: You can contact the storage R&D personnel for help. 4. Other Causes The damaged file system is caused by powering on and restarting hosts and storage arrays after abnormal power-off. The damaged file system is caused by transmission medium fault, such as fiber and cable damage, and data transmission link recovery from disconnection. The above scenarios may result in failed I/O delivering on the host and then a file system failure. Solution: Refer to maintenance documentation.

What's the plan for configuring disk enclosures of V3 storage systems?
You can use eDesigner, a storage configuration tool to plan the configuration of V3 storage disk enclosures. To obtain eDesigner, click Link.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top