Introduction
The Elasticsearch service supports multi-condition retrieval, statistics, and report generation for structured and unstructured text. It has a comprehensive monitoring system with a series of key indicators on systems, clusters, and query performance. Elasticsearch helps users focus on service logic implementation. This service applies to scenarios such as log search and analysis, time and space retrieval, time sequence retrieval and report generation, and intelligent search.
Elasticsearch has the following features:
l Powerful full-text search and highlight display
l Distributed real-time file storage, real-time analysis, and diversified search
l Scalability to hundreds of servers to process PB-level structured or unstructured data
l Documents are stored in indexes, which can be added, deleted, modified, and queried. Provides diversified document processing capabilities.
l Rich geographical information search and geographical location aggregation
Structure
The Elasticsearch cluster solution consists of the EsMaster and EsNode1, EsNode2, EsNode3, EsNode4, EsNode5, EsNode6, EsNode7, EsNode8, and EsNode9 processes, as shown in Figure 1-1. Table 1-1 describes the modules.
Figure 1-1 Structure
![]()
Table 1-1 Module description
|
Module |
Description |
|
Client |
Communicates with the EsMaster and EsNode instance processes in the Elasticsearch cluster over HTTP or HTTPS to perform distributed collection and search. |
|
EsMaster |
Stores meta data and index data of Elasticsearch. |
|
EsNode1-9 |
Stores the index data of the Elasticsearch. |
|
ZooKeeper cluster |
Provides heartbeat mechanism for processes in Elasticsearch clusters. |
Basic Concepts
l Index: An index is a logical namespace in Elasticsearch, consisting of one or multiple shards. Apache Lucene is used to read and write data in the index. It is similar to the relational database (RDB) instance database. One Elasticsearch instance can contain multiple indexes.
l Type: If documents of various structures are stored in an index, you can find the parameter mapping information according to the document type, facilitating document storage. The type is similar to the table in the database. One index corresponds to one document type.
l Document: A document is a basic unit of information that can be indexed. This document refers to JSON data at the top-level structure or obtained by serializing the root object. The document is similar to the row in the database. A type contains multiple documents.
l Mapping: A mapping is used to restrict the type of a field and can be automatically created based on data. The mapping is similar to the shema in the database.
l Field: The field is the minimum unit of a document. The field is similar to the column in the database. Each document contains multiple fields.
l EsNode: Elasticsearch node. A node is an Elasticsearch instance.
l EsMaster: The master node that temporarily manages some cluster-level changes, such as creating or deleting indexes, and adding or removing nodes. The master node does not participate in document level change or search. When the traffic increases, the master node does not become the bottleneck of the cluster.
l Shard: The shard is the smallest work unit in Elasticsearch. The document is stored and referenced in the shard.
l Primary shard: Each document in the index belongs to a primary shard. The number of primary shards determines the maximum data that can be stored in the index.
l Replica shard: A replica shard is a copy of the primary shard. It prevents data loss caused by hardware faults and provides read requests, such as searching for or retrieving documents from other shards.
l Recovery: Indicates data restoration or data redistribution. When a node is added or deleted, ElasticSearch redistributes index shards based on the load of the corresponding physical server. When a faulty node is restarted, data restoration is also performed.
l Gateway: Indicates the storage mode of an ElasticSearch index snapshot. By default, ElasticSearch stores an index in the memory. When the memory is full, ElasticSearch persistently saves the index to the local hard disk. A gateway stores index snapshots. When the corresponding ElasticSearch cluster is stopped and then restarted, the index backup data is read from the gateway. ElasticSearch supports multiple types of gateways, including local file systems (default), distributed file systems, Hadoop HDFS, and Amazon S3 cloud storage.
l Transport: Indicates the interaction mode between ElasticSearch internal nodes or clusters and the ElasticSearch client. By default, TCP is used for interaction. In addition, HTTP (JSON format), Thrift, Servlet, Memcached, and ZeroMQ transmission protocols (integrated through plug-ins) are supported.
l ZooKeeper: It is mandatory in Elasticsearch and provides functions such as storage of security authentication information.

