Reason why the disk fault indicator is not on when the disk is faulty

1

The storage disk has the physical status and logical status. The physical status can be considered as the physical feature status of the disk. Therefore, you are advised to replace the disk when a physical fault occurs. The logical status is assigned by the software. In some cases, the disk can still be used when a logical fault occurs. When the I/O of the disk fails, the logic is faulty.
If a disk is physically faulty, the logic is also faulty. The software turns on the fault indicator. Because the disk cannot be used, the disk fault indicator is on and prompts you to replace the disk.
The logic of a disk is faulty, but the physical status may not be faulty. In some cases, the disk can still be used. Therefore, the disk fault indicator is not turned on.

Other related questions:
When multiple services are running on the same storage pool, why are its disk indicators blinking differently?
The blinking status of a disk indicator is related to the service model, RAID level, and CKG size. For example: You create a storage pool from a disk domain consisting of 20 disks and set its RAID policy to RAID 6 (12D+2P). In addition, you create LUNs from the storage pool and run all 4K services and hybrid services separately on the LUNs. -For all 4K reads/writes, the whole file size is 4^4 x 100 x 4 KB, that is, 100 MB. -For mixed reads/writes, the whole file size is 30,430 MB. When the storage pool uses RAID 6 (12D+2P), the size of a CKG is 12 x 64 MB, that is, 768 MB. In this case, only one CKG is required for all 4K read/write I/Os and the indicators of only 14 disks are blinking in the service model. However, one CKG is insufficient for mixed reads/writes. For 30,430 MB of the mixed service file size, about five CKGs are required to meet storage requirements. The storage system will randomly select disks for data reads/writes and all disk indicators will be blinking.

Reasons of disk corrosion
Disks are integrated electronic mechanical devices. They have printed circuit boards (PCBs) which easily corrode in the corrosive environment. (Similar electronic devices have the same corrosion risk).

Reason why no events are displayed on the OSM page when disks are removed and inserted
Reason why no events are displayed on the OSM page when disks are removed and inserted: 1. Locating procedure: a. After the disk is removed and inserted, no events are displayed on the storage's OSM page. b. Copyback is not performed after the faulty disk is replaced. 2. Solution: a. This problem is caused by a software bug. After the disk is removed and inserted, SES does not report the event indicating the disk status. This problem has been solved in V100R001C02SPC031 (the corresponding controller version is 1.02.01.229.T08). b. Solution Remove and insert the disk again. The interval between disk insertion and removal must be longer than 15s. Check whether the disk runs properly. If possible, upgrade the storage software to 1.02.01.229.T08 or later versions. c. Log in to the CLI and run showsys and showallver to check the device type and current SES version. Determine whether the current device has potential risks.

Reason why only the first four disk indicators on the controller enclosure are blinking when the storage system is powered by BBUs
Reason why only the first four disk indicators on the controller enclosure are blinking when the storage system is powered by BBUs: FAQ-Why are only the first four disk indicators on the controller enclosure blinking when the storage system is powered by BBUs

Problem and solution when the hot spare disk does not replace the faulty disk (without reconstruction)
You can perform the following operations when the hot spare disk does not replace the faulty disk (without reconstruction): 1. Fault location and rectification a. When RAID 5 is configured, only one disk is allowed to be faulty in a disk domain at a time. If two disks are faulty, the RAID group fails. b. The operations such as reconstruction, copyback, and precopy can only be performed in specified scenarios. 2. Solution a. The failure is caused by a faulty RAID group. If one disk in a RAID group is faulty, a hot spare disk is automatically related to the faulty disk (run the CLI commandshowdisk -l to check and find that the status of the hot spare disk is used). The faulty disk starts reconstruction. During reconstruction, if another disk in the RAID group becomes faulty, the RAID group fails. The hot spare disk being used changes to free spare. Solution Recover the failed RAID group by referring to Troubleshooting RAID Group Failures. b. Precopy and reconstruction cannot be performed simultaneously. In present storage devices, only one disk can perform reconstruction (copyback and precopy) at one time. Disk reconstruction must be performed through serial processing regardless of RAID groups. Only two types of operations of different RAID groups can be processed concurrently, such as reconstruction and precopy, reconstruction and copyback, or copyback and precopy.

If you have more questions, you can seek help from following ways:
To iKnow To Live Chat
Scroll to top