Got it

Removing Multiple Faulty Nodes from a Storage Pool

93 0 0 0 0

Problem Information

Table1 Basic problem information

Item

Description

Storage type

Distributed storage

Product version

FusionStorage 8.0.0; FusionStorage 8.0.1

Problem type

Capacity reduction

Keyword

Storage capacity reduction; Downgrade; multi-node failures


Problem Symptom

After multiple nodes in a cluster are faulty, rectify the fault by removing the faulty nodes from the storage pool.

Problem Diagnosis

N/A

Causes

N/A

Solution

Procedure

  1. Use PuTTY to log in to a management node through the management plane floating IP address and switch to user dsware.

  2. Run the cd /opt/dsware/client/bin command to switch to the operation directory.

  3. Change the disk removal time on MDC nodes.

    Run the ./dswareTool.sh --op globalParametersOperation -opType modify -parameter g_node_out_timeout_normal:5 command and enter user name admin and its password to go to the CLI.

    [dsware@fsm01 bin]$ ./dswareTool.sh --op globalParametersOperation -opType modify -parameter g_node_out_timeout_normal:5This operation is high risk,please input y to continue:y[Tue Apr  7 17:42:21 CST 2020] DswareTool operation start.Enter User Name:adminEnter Password :Login server success.Operation finish successfully. Result Code:0The count of successful nodes is 10.The count of failed nodes is 0.[Tue Apr  7 17:42:37 CST 2020] DswareTool operation end.
  4. Switch the owning MDC node of the storage pool where a faulty node resides.

    Log in to a node in the control cluster and run the mdc_cmd.sh 165 -1 command to query the nodes where the MDCs of all storage pools reside.

    [root@HN_0_2 script]# mdc_cmd.sh 165 -1./dsware_insight 0 4 10.183.160.226 10530 8 165 -1[2020-04-07 19:12:59][-]|        POOL_ID|         MDC_ID|    STORAGE IP_0|    STORAGE IP_1| PORT|   ASSIGNED_FLAG|  MAPPING_STATUS|VERSION   4|[2020-04-07 19:12:59][-]| POOL         0| MDC          2|   10.183.160.226|   10.183.160.226|10530|            TRUE|          IN_USE|

    Locate the node where the MDC node is located based on the POOL_ID.

    Log in to the node where the MDC node is located and run the following command to reset the MDC process on the node.

    ps -ef|grep dsware_mdc | grep -v grep |awk -F " " '{print $2}' |xargs kill -9
  5. Wait for 5 minutes and check whether the faulty disks are removed from the cluster. If the fault persists, contact technical support engineers.

  6. If the nodes are removed from the cluster, restore the disk removal time to the default value.

    Run the ./dswareTool.sh --op globalParametersOperation -opType modify -parameter g_node_out_timeout_normal:10080 command and enter user name admin and its password to go to the CLI.

    [dsware@fsm01 bin]$ ./dswareTool.sh --op globalParametersOperation -opType modify -parameter g_node_out_timeout_normal:10080This operation is high risk,please input y to continue:y[Tue Apr  7 17:42:21 CST 2020] DswareTool operation start.Enter User Name:adminEnter Password :Login server success.Operation finish successfully. Result Code:0The count of successful nodes is 10.The count of failed nodes is 0.[Tue Apr  7 17:42:37 CST 2020] DswareTool operation end.

    Perform step 4 to owning MDC node of the storage pool where the faulty node resides.

Check After Recovery

Log in to the MDC node and run the following command to check whether the node removal time in the MDC configuration file is changed to 10080.

[root@HN_0_2 ~]# cat /opt/fusionstorage/persistence_layer/mdc/conf/mdc_conf.cfg |grep g_node_out_timeout_normalg_node_out_timeout_normal=10080

Suggestion and Summary

You must restore the node removal time to the default value.

Applicable Versions

All

Comment

You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.