Hello all,
In this case, mainly talks about "Multiple Elasticsearch Instances Are Faulty Because the value of index.max_result_window Is Too Large".
Applicable Version
6.5.x
Context and Symptom
A large amount of data is returned for the query request. As a result, multiple instances report the heap memory threshold exceeded alarm, some Elasticsearch instances are suspended, two Elasticsearch instances are taken offline, and the primary shard is down.
Cause Analysis
1. The ISV services continuously initiate a request for querying 10 million records at a time. As a result, the heap memory of the Elasticsearch instances is quickly used up and the instances are suspended.
2. The default value of index.max_result_window is 10000. It is suspected that the services have adjusted the value. Check the index settings and find that the services have changed the value to 2 billion.
Solution
1. Restart the Elasticsearch cluster to release memory. The restoration takes about 50 minutes. The translog restoration of a primary shard takes 45 minutes, which is too long.
2. Set the maximum number of records that can be returned in a query using the index settings to 10000.
curl -XPUT --negotiate -k -u : 'https://ip:port/Index name/_settings?pretty' -H 'Content-Type: application/json' -d '{"index.max_result_window": "10000"}'
3. The ISV adjusts the service query logic to query full data in scroll mode to avoid excessive query results.
Any solutions will be appreciated!