Symptom:
In a large cluster at a site, 246 EsNode instances are deployed. When the single-node system is unstable, the overall write speed of the ES cluster is slow.
Cause analysis:
Based on the analysis of specific services and onsite fault location, it is found that the single-point machine is unstable. At the same time, the index segments are set to 246. The ES segments are evenly allocated to each EsNode instance. When bulk writes data in bulk, the write speed of a single node is slow. As a result, the overall bulk write speed is low.
Solution:
Properly set the number of segments. When a fault occurs on a single node, migrate the hot segments on the node to other nodes.