1.1 Rules
Elasticsearch Application Scenarios
1. Types of the data to be searched are as follows: structured data (RDS), semi-structured data (web pages and XML files), and unstructured data. (logs, pictures, and images). Elasticsearch can perform a series of operations such as cleaning, word segmentation, and establishment of inverted indexes for the preceding data types, and then provide the full-text search capability.
2. The search criteria are diversified (for example, too many fields are involved). The common query cannot meet the following requirements: Query simple words and phrases, or multiple forms of words or phrases in the full text.
3. Read data is much more than written data.
Introduce required classes in Elasticsearch applications
Correct:
//Classes that need to be imported
when RestClient is created:
org.elasticsearch.client.RestClient;import
org.elasticsearch.client.RestClientBuilder;
//Classes that need to be imported when a request is sent:
org.apache.http.HttpEntity;import
org.apache.http.entity.ContentType;
//Class that need to be imported when a response is parsed:
org.elasticsearch.client.Response;
If the cluster is installed in the security mode, ensure that the time on the client is the same as that on the server
If the cluster is of the security edition and Kerberos authentication is required, the time on the server must be the same as that on the client. Pay attention to the time difference conversion between time zones. If the time is inconsistent, the client authentication fails and subsequent service processes cannot be executed.
When a self-built user performs index data operations, authentication information needs to be configured and the corresponding read and write permissions must be assigned to the user
If the cluster is of the security edition, Kerberos authentication is required for connection to the server. Perform the following operations to log in to the KDC:
private static void setSecConfig()
throws Exception {
String krb5ConfFile =
System.getProperty("user.dir") + File.separator + "conf" +
File.separator + "krb5.conf";
LOG.info("krb5ConfFile: "
+ krb5ConfFile);
System.setProperty("java.security.krb5.conf", krb5ConfFile);
String jaasPath = System.getProperty("user.dir")
+ File.separator + "conf" + File.separator + "jaas.conf";
LOG.info("jaasPath: " +
jaasPath);
System.setProperty("java.security.auth.login.config", jaasPath);
System.setProperty("javax.security.auth.useSubjectCredsOnly",
"false");
//add for ES security
indication
System.setProperty("es.security.indication", "true");
LOG.info("es.security.indication is " +
System.getProperty("es.security.indication"));
}
The krb5.conf and user.keytable files are obtained from the user management page of FusionInsight Manager. In jaas.conf, change principal to the user name and keyTab to the actual storage path of user.keytab.
The created user must have the read and write permissions on the index to be operated. Select the administrator role for the user or grant the read and write permissions of the corresponding index to the user.
If the cluster is of the security edition, set up a secure httpClientBuilder
If the cluster is of the security edition, set up a secure httpClientBuilder using the following codes:
HttpAsyncClientBuilder
httpClientBuilder =
HttpAsyncClientBuilder.create().setDefaultRequestConfig(requestConfigBuilder.build())
//default settings for connection pooling may be too constraining
.setMaxConnPerRoute(DEFAULT_MAX_CONN_PER_ROUTE).setMaxConnTotal(DEFAULT_MAX_CONN_TOTAL)
.setSSLContext(SSLContext.getDefault());
if (httpClientConfigCallback != null) {
httpClientBuilder =
httpClientConfigCallback.customizeHttpClient(httpClientBuilder);
}
if (isSecureMode) {
wrapSecureHttpAsyncClientBuilder(httpClientBuilder);
}
Invoke the RestClient closing function before the application ends
When the application ends, invoke the restClient.close() function.
1.2 Suggestions
Creating Elasticsearch Indexes
l To reduce the number of indexes and avoid huge mappings, store data with the same index structure in the same index.
l Do not place irrelevant data in the same index to avoid sparsity.
![]()
These suggestions are not recommended when you use the parent/child relationship between documents because this function is supported only by documents in the same index.
Shard Division Policy
Each Shard can process index and query requests. When setting the number of Shards, consider the following two aspects:
l It is recommended that the maximum capacity of a single Shard be less than or equal to 30 GB.
l Determine the number of primary Shards according to the maximum data capacity of the index and the capacity of a single Shard.
l To improve data reliability, set the number of replica Shards properly.
![]()
Once the number of primary Shards is determined, it cannot be changed. The number of replica Shards can be modified as required.
Do not return alarge result set
Elasticsearch is designed as a search engine that makes it very good at acquiring the best document that matches the query. It is not suitable for retrieving all documents that match a particular query. In this case, use the Scroll API.

