Symptom:
The manager page displays an alarm indicating that the number of file handles reaches the threshold. Check the host where the alarm is generated. It is found that the number of file handles occupied by the Es host instance process is not released.
Cause analysis:
Compile a script to monitor the file handle usage of all hosts. The script content is as follows: #!/bin/bashwhile [1 -eq 1]; dodate >> file_count.txtsh /opt/FusionInsight_SetupTool/preinstall/tools/cluster/clustercmd.sh "cat /proc/sys/fs/file-nr" >> file_count.txtsleep 60done Use the batch tool on the computer where the setuptools tool is installed. Check the file handle usage of the cluster index node. Find the machine with too many handles in the monitoring output file. Run the top command to view the Es instance whose CPU usage is too high, and then run the cd /proc/ process ID /fd command to go to the fd directory. In the fd directory, run the ll command to view the index data directory corresponding to the opened file descriptor. Find the UUID of the index. For example, the index UUID is: Index UUID: AY-XrpmQR*. Check the index name corresponding to the UUID. Check the index fragment status based on the obtained index name. The number of file handles is abnormal. The primary and secondary segments are abnormal as follows: After the data is stopped, the number of files in the primary and secondary segments is inconsistent. (The number of copies is smaller than that of the primary segment, and the index of the data that is being written is excluded.)
Solution:
Restore the file descriptor of each host to ensure that the instance host breaks down because the number of handles is used up. Manually set the index copy to 0 so that the system automatically releases the socket connection to restore the number of handles. curl –XPUT 'http: //127.0.0.1:24100/ index name /_settings?pretty' -H 'Content-Type:application/json' –d '{"number_of_replicas": 0}' The number of copies is restored after all copy documents on the node are deleted. curl –XPUT 'http: //127.0.0.1:24100/ index name /_settings?pretty' -H 'Content-Type:application/json' –d '{"number_of_replicas": 1}' Run the following command to check the cluster health status. If the value of status is green, the system is restored. curl –XGET' http: //127.0.0.1:24100/_cluster/health?pretty