Dear all,
This post tells you the working principle of Solr and the relationship with other components in Fusioninsight HD.
Descending-order Indexing
The traditional search (which uses the ascending-order indexing, as shown in Figure 1) starts from keypoints and then uses the keypoints to find the specific information that meets the search criteria. In the traditional mode, values are found according to keys. During search based on the ascending-order indexing, keywords are found by document number.
Figure 1 Ascending-order indexing
The Solr (Lucene) search uses the descending-order indexing mode (as shown in Figure 2). In this mode, keys are found according to values. Values in the full-text search indicate the keywords that need to be searched. Places where the keywords are stored are called dictionaries. Keys indicate document number lists, with which users can find the documents that contain the search keywords (values), as shown in the following figure. During search based on the descending-order indexing, document numbers are found by keyword and then documents are found by document number.
Figure 2 Descending-order indexing
Distributed Indexing Operation Procedure
Figure 3 describes the Solr distributed indexing operation procedure.
Figure 3 Distributed indexing operation procedure
The procedure is as follows:
1. When initiating a document indexing request, the Client obtains the SolrServer cluster information of SolrCloud from the ZooKeeper cluster, and then obtains any SolrServer that contains the Collection information according to the Collection information in the request.
2. The Client sends the document indexing request to a Replica of the related Shard in the Collection of the SolrServer.
3. If the Replica is not the Leader Replica, the Replica will forward the document indexing request to the Leader Replica in the same Shard.
4. After indexing documents locally, the Leader Replica routes the document indexing request to other Replicas for processing.
5. If the target Shard of the document indexing is not the Shard of this request, the Leader Replica of the Shard will forward the document indexing request to the Leader Replica of the target Shard.
6. After indexing documents locally, the Leader Replica of the target Shard routes the document indexing request to other Replicas of the Shard of the request for processing.
Distributed Search Operation Procedure
Figure 4 describes the Solr distributed search operation procedure.
Figure 4 Distributed search operation procedure
The procedure is as follows:
1. When initiating a search request, the Client obtains the SolrServer cluster information using ZooKeeper and then randomly selects a SolrServer that contains the Collection.
2. The Client sends the search request to any Replica (which does not need to be the Leader Replica) of the related Shard in the Collection of the SolrServer for processing.
3. The Replica starts a distributed query, converts the query into multiple subqueries based on the number of Shards of the Collection (there are two Shards in Figure 5, Shard 1 and Shard 2), and distributes each subquery to any Replica (which does not need to be the Leader Replica) of the related Shard for processing.
4. After each subquery is completed, the query results are returned.
5. After receiving the results of each subquery, the Replicas that receives a query request for the first time combines the query results and then sends the final results to the Client.
Relationship Between Solr and HDFS
Solr is a project of the Apache Software Foundation and a major component in the ecosystem of the Apache Hadoop project. Solr can use the Hadoop Distributed File System (HDFS) as its index file storage system. Solr is located on the structured storage layer. The HDFS provides highly reliable support for the storage of Solr. All index data files of Solr can be stored in the HDFS.
Relationship Between Solr and HBase
HBase stores massive data. It is a distributed column-oriented storage system built on the HDFS. Indexing for HBase data by Solr is the process of writing HBase data into the HDFS and creating indexes for HBase data. The index ID corresponds to the HBase data according to rowkey. Ensure that each piece of index data is unique and each piece of HBase data is unique, implementing full-text search for HBase data.
Thank you.