Method used manually mark BST after a dual-disk failure

You can mark BST after a dual-disk failure as follows:
1. Issue Description
The array supporting the BST function has a dual-disk failure and the RAID group fails.
Check whether sense key of failed disks is Medium Error, and Additional sense is not one of the following error code: UNRECOVERED, DATA SYNCHRONIZATION MARK ERROR, and DATA SYNC ERROR - RECOMMEND REWRITE. Check whether disks are rejected (if any one of the above error codes is displayed, the controller supports the BST function and the system will automatically mark disks with BST and recover them).
You can verify sense key by searching keyword Current sd in the /OSM/log/cur_debug/messages directory of controllers A and B and checking whether the information behind the keyword is Medium Error.
You can verify Additional sense by searching keyword Additional sense in the /OSM/log/cur_debug/messages directory of controllers A and B and checking whether the information behind the keyword is any one of the above error codes.
2. Solution
Verify the sequence of disk failures on the management plane.
Recover the above faulty disks which lead to a failed RAID group to the normal state (note: x and y are the user enclosure ID and slot ID of a faulty disk).
Recover the failed LUN to the normal state and the failed RAID group to the degraded state (note: x is the failed RAID group ID).
Check the message log of the array, search keyword OS_NotifyDiskBadEvent, and verify the address and length of Medium Error (as shown in the following figure, address 426a0d is 115-byte hexadecimal number).
Log in to the MML mode of the primary controller and manually mark the specified address of faulty disks with BST (x = External enclosure IDx32 + Slot ID, y is a decimal bad sector address, and z is the length of a bad sector, and 1 is the logical sector).
Verify whether the manual marking is successful.
The marked disk is used to reconstruct the first failed disk. Address reading will skip the address of this disk to avoid reconstruction failures. The address that is marked with BST can be recovered through the fault tolerance mode of the file system and database on the hosts.
The above method is applicable to the following scenarios:
a. The service scenario has moderate requirements for data integrity.
b. The method can be used when a failed disk is caused by other failures during fixed disk area reading. That is, mark the area with BST and skip the area during disk reading.
c. Use the method under the guidance of R&D engineers.

Scroll to top