Got it

HCIA-Big Data | Introduction to HBase

Latest reply: May 12, 2022 11:25:09 678 9 13 0 0

Hello, everyone!

This post describes the non-relational distributed database called HBase in the Hadoop open-source community, which can meet the requirements of large-scale and real-time data processing applications.

Introduction to HBase

HBase is a column-based distributed storage system that features high reliability, performance, and scalability.

  • HBase is suitable for storing data in a big table (the table can store billions of rows and millions of columns) and allows real-time data access.

  • Hadoop HDFS (Hadoop Distributed File System) is used as the file storage system to provide a distributed database system that supports real-time read and write operations.

  • HBase uses ZooKeeper as a collaboration service.

Comparison Between HBase and RDB, see What's the difference between HBase and RDB?

HBase Data Model

Simply, applications store data in HBase as tables.

A table consists of rows and columns. All columns belong to a column family.

The intersection of a row and a column is called a cell, and the cell is versioned. The contents of the cell are an indivisible byte array.

The row key of a table is also a byte array, so anything can be saved, either as a string or as a number.

HBase tables are sorted by key. The sorting mode is a byte. All tables must have a primary key.

HBase Table Structure

HBase Table Structure

Table: HBase uses tables to organize data. A table consists of rows and columns. A column is divided into several column families.

Row: Each HBase table consists of multiple rows, and each row is identified by a row key.

Column family: An HBase table is divided into multiple column families, which are basic access control units.

Column qualifier: Data in a column family is located by column qualifiers (or columns).

Cell: In an HBase table, a cell is determined by the row, column family, and column qualifier. Data stored in a cell has no data type and is considered as a byte array byte[].

Timestamp: Each cell stores multiple versions of the same data. These versions are indexed using timestamps.

Conceptual View of Data Storage

There is a table named webtable that contains two column families: contents and anchor. In this example, anchor has two columns (anchor:aa.com and anchor:bb.com), and contents have only one column (contents:html).

Conceptual View of Data Storage

Physical View of Data Storage

Although in the conceptual view, a table can be considered as a collection of sparse rows. Physically, however, it differentiates column family storage. New columns can be added to a column family without being declared.

Physical View of Data Storage

Row-based Storage

Row-based storage refers to data stored by rows in an underlying file system. Generally, a fixed amount of space is allocated to each row.

Advantages: Data can be added, modified, or read by row.

Disadvantage: Some unnecessary data is obtained when data in a column is queried.

Row-based Storage

Column-based Storage

Column-based storage refers to data stored by columns in an underlying file system.

Advantage: Data can be read or calculated by column.

Disadvantage: When a row is read, multiple I/O operations may be required.

Column-based Storage

HBase Architecture

HBase Architecture

The HBase architecture consists of the following functional components:

  • Library functions (linking to each client)

  • HMaster

  • HRegionServer

The HMaster server manages and maintains the partition information in the HBase table, maintains the HRegionServer list, allocates regions, and balances loads.

HRegionServer stores and maintains the allocated regions and process read and write requests from clients.

The client does not directly read data from HMaster. Instead, the client directly reads data from HRegionServer after obtaining the storage location of the region.

The client does not depend on the HMaster. Instead, the client obtains the region location through ZooKeeper. Most clients do not even communicate with the HMaster. This design reduces the load of the HMaster.

HBase Architecture

Table and Region

In normal cases, an HBase table has only one region. As the data volume increases, the HBase table is split 

into multiple regions.

The region splitting operation is fast because the region still reads the original storage file after the splitting. The region reads the new file only after the storage file is asynchronously written to an independent file.

Table and Region

Region Positioning 

Region is classified into Meta Region and User Region.

Meta Region records the routing information of each User Region.

To read and write region data routing information, perform the following steps:

  • Find the Meta Region address.

  • Find the User Region address based on Meta Region.

Region Positioning

To speed up access, the hbase:meta table is saved in memory.

Assume that each row (a mapping entry) in the hbase:meta table occupies about 1 KB in the memory, and the maximum size of each region is 128 MB.

In the two-layer structure,region regions can be saved.

Client

The client contains the interface for accessing HBase and maintains the location information of the accessed regions in the cache to accelerate subsequent data access.

The client queries the hbase:meta table first, and determines the location of the region. After the required region is located, the client directly accesses the corresponding region (without passing through the HMaster) and initiates a read/write request.

HMaster HA

ZooKeeper can help elect an HMaster node as the primary management node of the cluster and ensure that there is only one HMaster node running at any time, preventing a single point of failures (SPOFs) of the HMaster node.

HMaster

The HMaster server manages tables and regions by performing the following operations:

  • Manages users' operations on tables, such as adding, deleting, modifying, and querying.

  • Implements load balancing between different HRegionServers.

  • Adjusts the distribution of regions after they are split or merged.

  • Migrates the regions on the faulty HRegionServers.

HRegionServer

HRegionServer is the core module of HBase. It provides the following main functions:

  • Maintains the regions allocated.

  • Responds to users' read and write requests.

Another post about HBase architecture: [FI Components] Basic Principle about HBase.

Summary of HBase-related posts

Title

[FI Components] Basic Principle about Hbase

[HBase Emergency Failure Recovery]Data in a User Table Is Deleted or Abnormal

[HBase Emergency Failure Recovery]The Service Becomes Abnormal After HBase Table Data Is Manually Deleted

Spark Applications Fail to Access HBase in Another Cluster

"Failed to find any Kerberos tgt" Was Reported When the Spark Application Used AddResource to Access HBase Across Clusters

Apache Hive vs. Apache Hbase

Apache Phoenix: An SQL Driver for Hbase

HDFS vs Hbase

Install Phoenix on MRS HBase cluster and connect to superset

HBase REST API Invoking Example

HBase Full Migration Procedure

HBase Thrift API Invoking Example

Spark Reads Hive and Writes HBase Samples

Use Flume to consume Kafka topic data and store it in Hbase

What is the difference between the HBase and traditional databases?


That's all, thanks!

The post is synchronized to: HCIA-Big Data

  • x
  • convention:

pupu.F
Created Apr 2, 2022 02:07:59

HBase is a distributed, column-oriented storage system built on the Hadoop Distributed File System (HDFS). The column-based HBase features high reliability, performance, and scalability.
View more
  • x
  • convention:

olive.zhao
olive.zhao Created Apr 2, 2022 02:08:17 (0) (0)
Yes!  
zj5000
Created Apr 6, 2022 08:20:31

Thanks for your sharing!
View more
  • x
  • convention:

wissal
MVE Created May 9, 2022 12:22:24

Very interesting to know, learned
View more
  • x
  • convention:

Vien
Created May 9, 2022 14:07:27

Thanks for sharing
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 12, 2022 13:40:31 (0) (0)
 
KasimAbubakr
Created May 10, 2022 04:16:21

Thank you.
View more
  • x
  • convention:

DienLg
Created May 12, 2022 11:25:09

Good share
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 12, 2022 13:40:40 (0) (0)
Thanks!  

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.