Got it

In a Three-Network-Deployment Scenario with I/O Suspension Disabled, an I/O Error Fails to Be Returned Upon a Back-End Storage Network Fault

98 0 0 0 0

Symptom:

In a three-network deployment scenario (storage networks are classified into front- and back-end storage networks) with I/O suspension (matched with HCS or NFV) disabled, the redundancy of a storage pool exceeds the threshold due to a back-end storage network fault. As a result, the storage pool is in downgrade state for a long time and I/Os are suspended without I/O errors returned.


Diagnosis:

  1. Check whether the system is configured with three networks (storage networks are classified into front- and back-end storage networks), whether I/O suspension (matched with HCS or NFV) is disabled, and whether a back-end storage network fault alarm is generated. If none of the preceding situations occurs, this section is not applicable.

  2. Run the following command to query the MDC node to which the faulty storage pool belongs:

    mdc_cmd.sh 165 -1

    If the command output contains the mapping between the storage pool ID and the storage IP address of the owning MDC node, log in to the MDC node.

  3. Run the following command to switch to the log directory of the MDC node:

    cd /var/log/dsware/plog/mdc/bak

    Check whether record down will over redundancy, cannot down is printed for the mdc_handle_debug_be_check_notify_event function around the time when the fault occurs. If yes, the problem occurs.


Cause:

If the redundancy of a storage pool exceeds the threshold due to a back-end storage network fault, the system of the current version will ignore the fault. Therefore, if I/O suspension is disabled, later storage pool faults cannot be reported and I/Os keep retrying. As a result, I/O errors will not be returned.


Solution:

  1. Log in to the active FSM node using its floating IP address as user dsware.

  2. Run the following command to go to the specified directory:

    cd /opt/dsware/client/bin

  3. Run the following command to set the abort timeout interval:

    ./dswareTool.sh --op globalParametersOperation -opType modify -parameter abort_timeout:90

Restore the back-end network of the faulty node and check whether services in the storage pool are restored.


Check After Recovery:

Check whether the storage pool status is restored to normal.

Comment

You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.