Got it

HCIA-Big Data | HBase Key Processes and Highlights

Latest reply: May 3, 2022 04:02:23 631 21 10 0 0

Hello, friend!

This post will share with you the HBase key processes and highlights.

HBase Key Processes 

Data Read and Write Process

When you write data, the data is allocated to the corresponding HRegionServer for execution.

Your data is first written to MemStore and HLog.

The commit() invocation returns the data to the client only after the operation is written to HLog.

When you read data, the HRegionServer first accesses MemStore cache. If the MemStore cache cannot be found, the HRegionServer searches StoreFile on the disk.

Cache Refreshing

The system periodically writes the content in the MemStore cache to the StoreFile file in the disk, clears the cache, and writes a tag in the HLog.

A new StoreFile file is generated each time data is written. Therefore, each Store contains multiple StoreFile files.

Each HRegionServer has its own HLog file. Each time the HRegionServer is started, the HLog file is checked to confirm the latest startup. Check whether a new write operation is performed after the cache is refreshed. If an update is detected, the data is written to MemStore and then to StoreFile. At last, the old HLog file is deleted, and HRegionServer provides services for you.

Merging StoreFiles

A new StoreFile is generated each time data is flushed, affecting the search speed due to the large number of StoreFiles.

The Store.compact() function is used to combine multiple StoreFiles into one.

The merge operation is started only when the number of StoreFiles reaches a threshold because the merge operation consumes a large number of resources.

Store Implementation

Store is the core of a HRegionServer.

Multiple StoreFiles are combined into one Store.

When the size of a single StoreFile is too large, splitting is triggered. One parent region is split into two sub-regions.

Store Implementation

HLog Implementation

In a distributed environment, you need to consider system errors. HBase uses HLog to ensure system recovery.

The HBase system configures an HLog file for each HRegionServer, which is a write-ahead log (WAL).

The updated data can be written to the MemStore cache only after the data is written to logs. In addition, the cached data can be written to the disk only after the logs corresponding to the data cached in the MemStore are written to the disk.

HBase Highlights

Impact of Multiple HFiles

Impact of Multiple HFiles

The read latency prolongs as the number of HFiles increases.

HBase Compaction 

Compaction is used to reduce the number of small files (HFiles) in the same column family of the same region to improve the read performance.

Compaction is classified into minor compaction and major compaction.

  • Minor: indicates small-scale compaction. There are limits on the minimum and maximum number of files. Generally, small files in a continuous time range are merged.

  • Major: indicates the compaction of all HFile files under the column family of the region.

  • Minor compaction complies with a certain algorithm when selecting files.

Compaction

OpenScanner

In the OpenScanner process, two different scanners are created to read HFile and MemStore data.

  • The scanner corresponding to HFile is StoreFileScanner.

  • The scanner corresponding to MemStore is MemStoreScanner.

OpenScanner

BloomFilter

BloomFilter is used to optimize some random read scenarios, that is, the Get scenario. It can be used to quickly determine whether a piece of user data exists in a large data set (most data in the data set cannot be loaded to the memory).

BloomFilter has possibility of misjudgment when determining whether a piece of data exists. However, the judgment result of "The data xxxx does not exist" is reliable.

BloomFilter's data in HBase is stored in HFiles.

Summary of HBase-related posts

Title

[FI Components] Basic Principle about Hbase

[HBase Emergency Failure Recovery]Data in a User Table Is Deleted or Abnormal

[HBase Emergency Failure Recovery]The Service Becomes Abnormal After HBase Table Data Is Manually Deleted

Spark Applications Fail to Access HBase in Another Cluster

"Failed to find any Kerberos tgt" Was Reported When the Spark Application Used AddResource to Access HBase Across Clusters

Apache Hive vs. Apache Hbase

Apache Phoenix: An SQL Driver for Hbase

HDFS vs Hbase

Install Phoenix on MRS HBase cluster and connect to superset

HBase REST API Invoking Example

HBase Full Migration Procedure

HBase Thrift API Invoking Example

Spark Reads Hive and Writes HBase Samples

Use Flume to consume Kafka topic data and store it in Hbase

What is the difference between the HBase and traditional databases?

That's all, thanks!

The post is synchronized to: HCIA-Big Data

  • x
  • convention:

SamB
Moderator Created Mar 15, 2022 04:41:13

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Mar 15, 2022 05:46:44 (0) (0)
 
VinceD
Moderator Created Apr 2, 2022 16:03:50

interesting content.
View more
  • x
  • convention:

VinceD
VinceD Created Apr 2, 2022 16:04:11 (0) (0)
 
harisaliehsan
harisaliehsan Created Apr 2, 2022 18:11:21 (0) (0)
 
Saqibaz
Created Apr 2, 2022 17:46:58

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 6, 2022 05:12:16 (0) (0)
 
Irshadhussain
Created Apr 2, 2022 18:10:20

Nice
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 6, 2022 05:12:23 (0) (0)
Thanks!  
Irshadhussain
Created Apr 2, 2022 18:10:28

Thanks for Sharing
View more
  • x
  • convention:

harisaliehsan
Created Apr 2, 2022 18:11:32

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 6, 2022 05:12:29 (0) (0)
 
gabo.lr
MVE Created Apr 2, 2022 19:05:27

Thanks for sharing!
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 6, 2022 05:12:35 (0) (0)
 
user_4602619
Created Apr 2, 2022 19:14:25

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 6, 2022 05:12:40 (0) (0)
 
user_4602619
user_4602619 Reply olive.zhao  Created Apr 6, 2022 17:45:51 (0) (0)
 
MahMush
Moderator Author Created Apr 30, 2022 16:04:46

Good collection of Hbase posts with highlights
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 5, 2022 06:36:58 (0) (0)
 
12
Back to list

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.