Got it

I/O Errors Occur Because Host Cluster Reservation Information Is Not Cleared Before Heterogeneous Third-Party Storage

Latest reply: Oct 12, 2021 05:26:18 191 1 1 0 0

[Symptom Description]

The OceanStor 5600 V3 takes over other storage devices using the heterogeneous feature and creates a volume mirror on the heterogeneous LUN. The customer reported that an alarm was generated on the OceanStor V3 storage system, indicating that the volume mirror copy was disconnected abnormally and some directories in the cluster were inaccessible.

[Alarm Information]

2016-04-23 17:41 0xF02470010 Critical None The mirror copy (mirror LUN ID 85, mirror copy ID 130) is abnormally interrupted.

2016-04-23 17:41 0xF02470010 Critical 2016-04-23 17:41 The mirror copy (mirror LUN ID 98, mirror copy ID 157) is abnormally interrupted.

[Cause Description]

External LUNs have been mapped to clusters before being mapped to V3 storage. When the LUNs are mapped to V3 storage, the cluster reservation information of the external LUNs is not cleared. As a result, when the V3 storage array delivers write I/Os to the external LUNs, a reservation conflict error is returned, as a result, the volume mirror copy is disconnected unexpectedly and some directories in the cluster cannot be accessed.

[Diagnosis Method]

According to the analysis of event records and array logs, the data volume of one copy is on the external storage, and the other is on the V3 storage. According to the message log, the mirror LUN whose ID is 85 is disconnected from the copy whose ID is 130 because an error is returned when the copy whose ID is 130 is written, but the volume mirror copy whose ID is 130 is an external LUN.

'[ERR][Write slave LUN return not ok; src objId(805306453), dst objId(130), lba(41296960),length(8),opCode(320962560),ret(-2147464317),retry(0).][PAIR]'

Search for the function printing (keyword: initLunObjForCreate) for creating a mirror copy and find that the logical ID (ldid) of the mirror copy with ID 130 is 641.

'[INFO][OBJ_CTRL: Create LUN(id 130) object.][SYS][nodeCreateLunObj,5488][TP_SysRpcTPool_]'

'[WARN][Insert object(641) value(85) to red-black tree succeed.][VOL][insertObjectToVolRBTree]'

'[WARN][LUN(ID 130, uuid 6dee22db-de66-b5b7-d141-565276b9f0fc, type 0, origin LUN id 4294967295, user size 204800640, disk domain 0, pool 0, volume id 85).][LUN][initLunObjForCreate]'

Search for the keyword ldProcNewTmpDisk and find that the sid of the disk whose ID is 641 is 176.

'[WARN][Allocate one user logic disk(ldId:641) succeed for disk(sdId:176, ctrlId:0).][BDM_LD][ldProcNewTmpDisk]'

The following information about the external LUN is displayed in the log:

'[INFO[MEH:req(ffff8801b9a795e8),sdid(176),path(7:-1:0:108),wrong(0x00000018),in(4),ret(1),sio(2),O:R:T(0:0:0),sk(0x0),a(0x0),aq(0x0).opc(0x0),info(0xffffffffffffffff][BDM_MP][mpPrintErrData]'

The error code 0x00000018 indicates that cluster reservation information exists on the external LUN and a reservation conflict error is returned for write I/Os delivered by OceanStor V3 storage.

Root cause: During service migration at the site, after LUNs of external storage are demapped from the cluster, the LUNs are directly mapped to V3 storage without checking whether there are residual cluster reservation information on the LUNs. As a result, subsequent write I/Os fail.

[Solution]

Perform the following steps to rectify the fault:

1. Clear the cluster reservation information of external LUNs.

2. Clear the fault page of OceanStor V3 storage.


Step 1 Clear the cluster reservation information of the external LUN.


Note: After the reserved external LUNs are cleared, if the heterogeneous takeover is canceled, the original cluster may fail to use the external LUNs. In this case, contact upper-layer cluster software engineers for assistance. Evaluate the risks before performing this operation.


The method of clearing the reservation information varies according to storage versions. The following describes how to clear the reservation information of external LUNs on OceanStor V3 storage systems.


(1) V300R002C10SPC200 and later versions


A. Use SecureCRT, PuTTY, or Xshell to log in to the device management IP address and enter the engine mode from the CLI.


admin:/>change user_mode current_mode user_mode=engineer


engineer:/>


B. Run the scan remote_lun command to scan for external LUNs.


engineer:/>scan remote_lun


Command executed successfully.


engineer:/>


C. Run the show lun_reserve general command to query the reservation information of the external LUN corresponding to the eDevLUN.


engineer:/>show lun_reserve general


  eDevLUN Id    Type


  -------------    -------------------------


  130            Exclusive Access


engineer:/>


D. show lun_reserve general edevlun_id=* To query the reservation information about an external LUN corresponding to an eDevLUN, run the following command:


engineer:/>show lun_reserve general edevlun_id=130


eDevLUN Id : 130


Type         :  Exclusive Access


E. Run the following command to clear the reservation information of the external LUN through change lun_reserve clear edevlun_id=*:


engineer:/>change lun_reserve clear edevlun_id=130


DANGER: You are about to delete reservation information about the remote LUN. This operation disables SCSI 3 write protection for the LUN and data on the remote LUN may be damaged as a result.


Suggestion: Before performing this operation, ensure that all host services running on the remote LUN have been stopped and no other hosts are accessing the LUN.

Have you read danger alert message carefully?(y/n)y


Are you sure you really want to perform the operation?(y/n)y


Clear LUN 130 successfully.


F. After the deletion is successful, scan for external LUNs again and check whether the reservation information is successfully deleted.


engineer:/>scan remote_lun


Command executed successfully.


engineer:/>




engineer:/>show lun_reserve general


Command executed successfully.




(2) Versions earlier than V300R002C10SPC200


No command is available on the OceanStor V3 storage system to clear the reservation information of external LUNs. You are advised to contact cluster software engineers to clear the reservation information. If the reservation information must be cleared on the OceanStor V3 storage system, perform the following operations:


Preparations:


A. You are advised to use the SecureCRT tool to enter the Storage mode.http://3ms.huawei.com/hi/group/2856321/blog_2116951.html?mapId=3366975)


B. Copy and decompress the sg_utility tool to any storage directory. (SG tool:http://3ms.huawei.com/hi/group/2856321/blog_2116983.html?mapId=3367067)


C. Go to the sg_utility directory and ensure that all scripts have the execute permission. Run the chmod 777 sg_utility/* command to add the execute permission. Copy libsgutils.so.1 to (cp libsgutils.so.1 /lib64/) in /lib64.




Procedure:


1. Remove the mirror copy (the copy of the external LUN).


On DeviceManager, choose Data Protection > Volume Mirror and remove the corresponding volume mirror.


2. In the storage system, use the sg tool to clear the reservation information of the external LUN cluster.


a. The SID of the external LUN whose reservation information needs to be cleared is 176.


b. Run the ./sg_persist -i -r /dev/sd-176a command to check whether the disk has reserved space. See the following figure.


Storage:~/sg_utility/sg_utility # ./sg_persist -i -r /dev/sd-176a


SUN       STK6580_6780      0780


Peripheral device type: disk


PR generation=0x1, Reservation follows:


Key=0x12345


scope: LU_SCOPE,  type: Write Exclusive


key=0x12345 indicates that a registration key is 0x12345. type: Write Exclusive indicates that the reservation type is write exclusive.


c. Run the ./sg_persist -o -G -K 0x00 -S 0x12345 /dev/sd-176a command to register the disk /dev/sd-176a. (Change 0x12345 based on the query result in step b.)


Storage:~/sg_utility/sg_utility # ./sg_persist -o -G -K 0x00 -S 0x12345 /dev/sd-176a


SUN       STK6580_6780      0780


Peripheral device type: disk


Note: Check the number of paths from the external LUN to the V3 storage, that is, how many links are connected to the V3 storage during heterogeneous takeover. If there are four links at the site, run the same command four times to ensure that the registration succeeds.


d. Run the ./sg_persist -o -C -K 0x12345 /dev/sd-176a command to clear all reserved resources on the disk.


Storage:~/sg_utility/sg_utility # ./sg_persist -o -C -K 0x12345 /dev/sd-176a


SUN       STK6580_6780      0780


Peripheral device type: disk


e. Run the ./sg_persist -i -r /dev/sd-176a command to check whether the reserved information on the disk is cleared.


Storage:~/sg_utility/sg_utility # ./sg_persist -i -r /dev/sd-176a


SUN       STK6580_6780      0780


Peripheral device type: disk


PR generation=0x0, there is NO reservation held




[TRANSLATE_FAILED]


A failure is returned when data is written to the external LUN. As a result, the fault page exists on the OceanStor V3 storage system. After clearing the cluster reservation information of the external LUN, run the following command to clear the fault page of the OceanStor V3 storage system:


admin:/>change user_mode current_mode user_mode=developer




developer:/>change lun_fault_page recover lun_id_list=85


Recover fault page of LUN 85 successfully.


developer:/>


Note: Enter the ID of the mirror LUN in lun_id_list.



Well done. Thanks for sharing.
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.