This post relates to the search performance dimensions - Request rate and delay. Please see below.
We can measure the effectiveness of a cluster by measuring the rate at which the system processes the request and the use time of each request.
When a cluster receives a request, it may need to access the data in multiple segments across multiple nodes. The core indicators, such as the rate at which the system processes and returns the requests, the number of ongoing requests and the duration of requests are important factors for measuring cluster health.
The request process is divided into two phases:
The first is the query phase (query phase). The cluster distributes the requests to each segment (primary or replica shard) in the index;
The second step is to obtain the phase (fetch phrase). The query result is collected, processed and returned to the user.
Run the GET index_a/_stats command to check the status of the target index. For reasons that are limited to space, the return is not provided for all. Let's do a little bit of practice.

Key indicators related to the request retrieval performance are as follows:
query_current: The number of ongoing queries. The number of queries that are being processed by the cluster;
fetch_current: Indicates the number of ongoing fetchs. The number of ongoing fetchs in the cluster;
query_total: The total number of queries. The number of aggregations of all queries processed by the cluster;
query_time_in_millis: Total query duration. The total time consumed by all queries, in milliseconds;
fetch_total: The total number of extracted records. Indicates the total number of fetchs processed by the cluster;
fetch_time_in_millis: Total time spent on fetch. Total time consumed by all fetchs (in milliseconds).