Problem and solution when the hot spare disk does not replace the faulty disk (without reconstruction)


You can perform the following operations when the hot spare disk does not replace the faulty disk (without reconstruction):
1. Fault location and rectification
a. When RAID 5 is configured, only one disk is allowed to be faulty in a disk domain at a time. If two disks are faulty, the RAID group fails.
b. The operations such as reconstruction, copyback, and precopy can only be performed in specified scenarios.
2. Solution
a. The failure is caused by a faulty RAID group.
If one disk in a RAID group is faulty, a hot spare disk is automatically related to the faulty disk (run the CLI commandshowdisk -l to check and find that the status of the hot spare disk is used). The faulty disk starts reconstruction.
During reconstruction, if another disk in the RAID group becomes faulty, the RAID group fails. The hot spare disk being used changes to free spare.
Recover the failed RAID group by referring to Troubleshooting RAID Group Failures. b. Precopy and reconstruction cannot be performed simultaneously.
In present storage devices, only one disk can perform reconstruction (copyback and precopy) at one time. Disk reconstruction must be performed through serial processing regardless of RAID groups.
Only two types of operations of different RAID groups can be processed concurrently, such as reconstruction and precopy, reconstruction and copyback, or copyback and precopy.

Other related questions:
Replacement of a malfunctioning coffer disk with a hot spare disk
A hot spare disk can replace a malfunctioning coffer disk if the two disks are of the same type and the hot spare disk provides a capacity equal to or larger than that of the coffer disk.

Functions of hot spare disks
Definition and function of hot spare disks
During service running, a RAID group may fail or downgrade due to a disk enclosure's unexpected power outage or member disk removal by mistake, causing service data loss. To ensure the reliability of the storage system, you are advised to create hot spare disks after creating RAID groups. The hot spare disks can replace the faulty ones and data will be rebuilt to the hot spare.

Problem and solution when disk isolation occurs
You can perform the following operations when disk isolation occurs: The following causes may result in disk isolation: Bit error Reinserting disks repeatedly Disk power connection problem 1. Bit error Check the bit error of back-end SAS disks. Search keywords err inc and disable disk phy in the SES log. Note: phy:9 phymon***disable disk phy in the log shows that disk phy 9 is isolated. That is, the disk in slot 9 is isolated (phy0 to phy23 corresponds to disk 0 to 23). Troubleshooting 1. Before removing a faulty disk, collect S.M.A.R.T. information. 2. If conditions permit, insert the isolated disk to other slots to check whether isolation is caused by the disk or the slot. If isolation is caused by the disk, apply for disk replacement. If isolation is caused by the slot, check whether the slot has any foreign objects. Check the bit error on Fibre Channel disks. Search keyword lcv that is Fibre Channel bit errors in the SES log. If HD 0 and lcv ffff are displayed, the information indicates that a large quantity of bit errors are produced in slot 0 and cause disk isolation. The back-end Fibre Channel bit errors can spread from the port to the disk. If a Fibre Channel disk is isolated, check whether bit errors occur on the port by using the following methods: Check on the ISM. Enter fc allinfo in MML mode. Note: If any information displayed is not 0, bit errors exist. If bit errors are detected on the port, verify whether bit error are generated in the link. For details about how to verify, see the troubleshooting cases for a single link failure of the Fibre Channel enclosure disk caused by bit errors. Troubleshooting: If only one disk fails, verify the failure by using the above method. If a link fails, replace the optical module and optical cables and verify the failure. If a link does not fail, use the same method as one carried out on the SAS disk. If multiple disks are faulty, refer to the troubleshooting cases for a single link failure of the Fibre Channel enclosure disk caused by bit errors. 2. Reinserting disks repeatedly Note: A drive can isolate the disk from other ones if intermittent disconnections occur on the disk. Reinserting disks repeatedly may lead to disk isolation. Verify whether the disk is reinserted many times within a short period. If such a case exists, reinserting disk may result in disk isolation. Troubleshooting: Reinsert the disk. 3. Disk power connection problem Note: If the disk enclosure is affected by violent shaking, disk power may be insecurely connected and the disk is isolated. Troubleshooting: Contact R&D engineers for further analysis.

Whether OceanStor 9000 has hot spare or backup disks
OceanStor 9000 does not have any hot spare or backup disks. All storage nodes are provided for the same purpose. Raw data and redundant data are evenly stored on all nodes and disks of OceanStor 9000, and such data is available as recovery resources.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top