Got it

Working Principles of Elasticsearch

Latest reply: Jun 1, 2022 01:46:52 1446 5 1 0 0

Hello, everyone!

This post will share with you the working principles of Elasticsearch.

The internal architecture of Elasticsearch

The Elasticsearch provides various access interfaces through RESTful APIs or other languages (such as Java), uses the cluster discovery mechanism, and supports script languages, and various plug-ins. The underlying layer is based on Lucene, with absolute independence of Lucene and stores indexes through local files, shared files, and HDFS, as shown in Figure 1-1.

Figure 1-1 Internal architecture

20180504140555339001.png


Inverted indexing

The conventional search method (forward indexing, as shown in Figure 1-2) starts from the key, and searches the information represented by the key that can meet the specific information in the search conditions. That is, the value is searched based on the key. Positive indexing is about finding the keyword by document number.

Figure 1-2 Forward indexing

20180504140555783002.png

 

In Elasticsearch (Lucene), the inverted indexing mode is used (as shown in Figure 1-3), that is, finding the key based on the value. In full-text search, the value is the search keyword and the place for storing keywords is called dictionary. The key is the document number list, through which documents with the search keyword can be filtered out. As shown in the following figure, inverted indexing means querying the document number based on the keyword and then querying the document based on the document number. This is similar to looking up a dictionary or querying the content of the specified page based on the contents of a book.

Figure 1-3 Inverted indexing

20180504140556570003.png

 

Distributed indexing flow

Figure 1-5 shows the process of Elasticsearch distributed indexing flow.

Figure 1-4 Distributed indexing flow

20180504140557764004.png

The process is as follows:

Phase 1: The client sends an index request to any node, for example, Node 1.

Phase 2: Node 1 determines the shard to store the file based on the request. Assume the shard is shard 0. Node 1 then forwards the request to Node 3 where primary shard P0 of shard 0 exists.

Phase 3: Node 3 executes the request on primary shard P0 of shard 0. If the request is successfully executed, Node 3 sends the request to all the replica shard R0 in Node 1 and Node 2 simultaneously. If all the replica shards successfully execute the request, a verification message is returned to Node 3. After receiving the verification messages from all the replica shards, Node 3 returns a success message to the user.

Distributed searching flow

The Elasticsearch distributed searching flow consists of two phases: Query and acquisition.

Figure 1-5 shows the query phase.

Figure 1-5 Query phase of the distributed searching flow

20180504140558367005.png

The process is as follows:

Phase 1: The client sends a retrieval request to any node, for example, Node 3.

Phase 2: Node 3 sends the retrieval request to each shard in the index adopting the polling policy. One of the primary shards and all of its replica shards is randomly selected to balance the read request load. Each shard performs retrieval locally and adds the sorting result locally.

Phase 3: Each shard returns the local result to Node 3. Node 3 combines these values and performs global sorting.

In the query phase, the data to be retrieved is located. In the acquisition phase, these data will be collected and returned to the client. Figure 1-6 shows the acquisition phase.

Figure 1-6 Acquisition phase of the distributed searching flow

20180504140558557006.png

 

The process is as follows:

Phase 1: After all data to be retrieved are located, Node 3 sends a request to related shards.

Phase 2: Each shard that receives the request from Node 3 reads the related files and return them to Node 3.

Phase 3: After obtaining all the files returned by the shards, Node 3 combines them into a summary result and returns it to the client.

Distributed bulk indexing flow

20180504140559459007.png

The process is as follows:

Phase 1: The client sends a bulk request to Node 1.

Phase 2: Node 1 constructs a bulk request for each shard and forwards the requests to the primary shard according to the request.

Phase 3: The primary shard executes the requests one by one. After one operation is complete, the primary shard forwards the new file (or deleted part) to the corresponding replica node, then moves on to the next operation. Replica nodes report all operations are complete. The Node reports to the requesting node. The requesting node sorts the response and returns it to the client.

Distributed bulk searching flow

20180504140600955008.png

The process is as follows:

Phase 1: The client sends a mget request to Node 1.

Phase 2: Node 1 constructs a multi-data retrieval request for each shard and forwards the requests to the primary shard or its replica shard based on the requests. When all replies are received, Node 1 constructs a response and returns it to the client.

Routing algorithm

Elasticsearch provides two routing algorithms:

  • Default route: shard=hash (routing) %number_of_primary_shards . In this routing policy, the number of received shards is limited. During capacity expansion, the number of shards needs to be multiplied (ES6.x). In addition, when creating an index, you need to specify the capacity to be expanded in the future. ES5.x does not support capacity expansion. ES7.x can be expanded freely.

  • Custom route: In this routing mode, the routing can be specified to determine the shard to which the file is written, or only the specified routing can be searched.

Balancing algorithm

Elasticsearch provides the automatic balance function for capacity expansion, capacity reduction, and data import scenarios. The algorithm is as follows:

weight_index(node, index) = indexBalance * (node.numShards(index) - avgShardsPerNode(index))

Weight_node(node, index) = shardBalance * (node.numShards() - avgShardsPerNode)

weight(node, index) = weight_index(node, index) + weight_node(node, index)

Single-node multi-instance deployment

Multiple Elasticsearch instances can be deployed on the same node, and differentiated from each other based on the IP address and port number. This method increases the usage of the single-node CPU, memory, and disk, and improves the Elasticsearch indexing and searching capability.

20180504140600359009.png

Cross-node replica allocation policy

When multiple instances are deployed on a single node and multiple replicas exist, replicas can only be allocated across instances. A single-point failure may occur. Solve this problem by adding the default configuration of cluster.routing.allocation.same_shard.host:true.

20180504140601683010.png

That's all, thanks!


  • x
  • convention:

jino94
Created May 18, 2018 02:45:25

Very informative!Working Principles of Elasticsearch-2666919-1
View more
  • x
  • convention:

sze_van
Created May 18, 2018 07:09:27

Thank for sharing:)
View more
  • x
  • convention:

II
Created May 22, 2018 01:30:02

:)
View more
  • x
  • convention:

user_2921761
Created Jun 27, 2018 01:40:56

good
View more
  • x
  • convention:

olive.zhao
Admin Created Jun 1, 2022 01:46:52

The implementation principle of Elasticsearch is very detailed. Thank you.
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.