Got it

Fast Restart of Elasticsearch

Latest reply: Nov 21, 2021 15:41:37 798 5 4 0 0

Hello all, 

This case will mainly talk about "Fast Restart of Elasticsearch".

Applicable Version

6.5.x

Context and Symptom

In scenarios where the Elasticsearch cluster specifications are exceeded or the shard planning is improper, it takes a long time to restore the shards after the Elasticsearch cluster is restarted. To quickly start a cluster, users need to adjust parameters and perform operations to prevent data copy during shard restoration.

Impact and Severity

Elasticsearch is unavailable during the upgrade.

Restoration Duration

The duration varies depending on the cluster scale. For example:

Assume that the Elasticsearch cluster stores 200 TB data. If the following steps are not performed, it takes more than 3 hours to restore and load data after the Elasticsearch cluster is restarted. After the following steps are performed, the data loading takes about 60 minutes after the restart, which increases the loading speed by several times.

Prerequisites

Prepare the following items before the restoration.

Table1 Items to be prepared before the restoration


No.

Item

Operation

1

Cluster account information

Apply for the password of cluster user admin.

2

Node account information

Apply for the passwords of users omm and root of cluster nodes.

3

SSH remote login tool

Prepare such tools as PuTTY or SecureCRT.

4

Client

Install the client.

Fault Handling

You are advised to perform the following operations to restart the cluster: stop the write task, adjust the restoration time limit of the node, perform the flush operation, increase the restoration time of the node, restart the data node and master node in sequence, and restore the parameters after the cluster status becomes normal.

Procedure

1. Stop the write task and adjust the restoration time limit of the node.

2. To stop services, you can stop writing programs on the service side or set read-only indexes on the platform side. You can set read-only indexes and adjust the node restoration time limit as follows: If services are stopped on the platform, you do not need to set the index.blocks.write parameter.

curl -XPUT "127.0.0.1:24100/*/_settings?pretty&master_timeout=120s" -H 'Content-Type: application/json' -d'
{
"index.blocks.write": "true" ,
"index.unassigned.node_left.delayed_timeout": "6h"
}'

After the command is executed, check whether true is returned. If true is not returned, some indexes fail to be executed. In this case, run the preceding command again. If the fault persists after four to five attempts, go to the next step.

{
"acknowledged" : true
}'

3. Perform the refresh and flush operations.

a. Run the _refresh command to flush data in the memory to disks.

curl -XPOST 127.0.0.1:24100/_refresh?pretty

Check whether the value of failed is 0 in the command output. If the value is not 0, run the command repeatedly. If the fault persists after four to five attempts, go to the next step.

"_shards" : {
"total" : 18,
"successful" : 18,
"failed" : 0
}

b. Run the sync flush command for multiple times to generate sync_id. The result is true.

curl -XPOST "127.0.0.1:24100/*/_flush/synced?pretty&filter_path=_shards"

Check whether the value of failed is 0 in the command output. If the value is not 0, run the command repeatedly. If the fault persists after four to five attempts, go to the next step.

"_shards" : {
"total" : 18,
"successful" : 18,
"failed" : 0
}

4. Disable the rebalance function to optimize the restoration speed.

Run the following commands to disable the rebalance function and set the index restoration speed:

Run the following commands to disable the rebalance function and set the index restoration speed:

curl -X PUT "127.0.0.1:24100/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
curl -X PUT "127.0.0.1:24100/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.rebalance.enable" : "none",
"cluster.routing.allocation.node_initial_primaries_recoveries":120,
"cluster.routing.allocation.node_concurrent_recoveries":60,
"indices.recovery.max_bytes_per_sec":"1gb"
}
}'


5. Restart the data node and master node in sequence.

a. On the displayed page, select the Elasticsearch data instances and restart them. Wait until the cluster status changes to green.

curl -XGET "127.0.0.1:24100/_cluster/health?pretty"

b. Restart the active EsMaster instance and then other EsMaster instances. Run the following command to query the current active EsMaster instance and restart the instance on FusionInsight Manager. After the cluster status changes to green, restart other EsMaster instances.

curl -XGET "127.0.0.1:24100/_cat/master?v"

curl -XGET "127.0.0.1:24100/_cluster/health?pretty"

6. Restore parameters and services after the status becomes normal.

a. Restore the index.unassigned.node_left.delayed_timeout and index.blocks.write parameters to the default values. If services are stopped on the platform, you do not need to set the index.blocks.write parameter.

curl -XPUT "127.0.0.1:24100/*/_settings?pretty&master_timeout=120s" -H 'Content-Type: application/json' -d'
{
"index.unassigned.node_left.delayed_timeout": "10m",
"index.blocks.write": null
}'

b. Run the following command to restore the rebalance parameter to all:

curl -X PUT "127.0.0.1:24100/_cluster/settings?pretty&master_timeout=120s " -H 'Content-Type: application/json' -d'
{
"persistent" : { "cluster.routing.rebalance.enable" : "all" }
}'

Note:

The curl command in the procedure is an example of a non-security version.

Check whether the Elasticsearch cluster is in security mode.

On FusionInsight Manager, check whether the value of ELASTICSEARCH_SECURITY_ENABLE of Elasticsearch is true. If the value is true, the security mode is used.

Check whether Elasticsearch disables the TLS of an earlier version.

On FusionInsight Manager, check whether the value of DISABLE_TLS_LOW_PROTOCOL/DISABLE_TLSV1_PROTOCOL of Elasticsearch is true. If the value is true, TLS of an earlier version is disabled. (The configuration item in versions earlier than C80SPC300 is DISABLE_TLSV1_PROTOCOL.)

If the security mode is used and earlier versions are forbidden, run the 

curl -XGET --tlsv1.2 --negotiate -k -u: "https://127.0.0.1:24100/_cluster/health?pretty" command.

If the security mode is used and earlier versions are not forbidden, run the 

curl -XGET TLS version on the OS --negotiate -k -u : "https://127.0.0.1:24100/_cluster/health?pretty" command.

If a non-security mode is used, run the curl -XGET "http://127.0.0.1:24100/_cluster/health?pretty" command.


Verification

If the instance status is normal after the instances are restarted and the cluster status is green by running _cluster/health, the service is restarted and data is restored.

Any solutions will be appreciated!


The post is synchronized to: FusionInsight HD Troubleshooting

  • x
  • convention:

little_fish
Admin Created Mar 2, 2021 06:54:38

very good
View more
  • x
  • convention:

Unicef
MVE Created Mar 2, 2021 07:17:42

Well done
View more
  • x
  • convention:

olive.zhao
Admin Created Nov 15, 2021 03:16:57

Very clear for fast restart of Elasticsearch
View more
  • x
  • convention:

MahMush
Moderator Author Created Nov 15, 2021 05:07:09

good one
View more
  • x
  • convention:

user_4358465
Created Nov 21, 2021 15:41:37

Well detailed. Thanks for sharing it!
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.