Hello, friend!
This post will share with you the HBase performance tuning and common HBase shell commands.
HBase Performance Tuning
Row Key
Row keys are stored in alphabetical order. Therefore, when designing row keys, you need to fully use the sorting feature to store the data that is frequently read together and the data that may be accessed recently.
For example, if the data that is recently written to the HBase table is most likely to be accessed, you can use the timestamp as a part of the row key. Because the data is sorted in alphabetical order, you can use Long.MAX_VALUE -timestamp as the row key, in this way, newly written data can be quickly hit when being read.
Unlike the relational database, there is only one key in HBase row key. Row key are stored in alphabetical order. Therefore when designing the row keys or fully using the sorting feature to store the data frequently read together or accessed recently, it can improve the efficiency a lot by this mechanism.
Creating HBase Secondary Index
HBase has only one index for row keys.
There are three methods for accessing rows in the HBase table:
Access through a single rowkey.
Access through a row key interval.
Full table scan.
Hindex Secondary Index
Hindex is a Java-based HBase secondary index developed by Huawei and is compatible with Apache HBase 0.94.8. The current features are as follows:
Multiple table indexes.
Multiple column indexes.
Index based on some column values.
This is enhanced feature is called secondary index, we use row key to locate the whole line together with the column family as well as the column qualifiers, etc.
But if we have a record say a guy called Zhangsan and we want to find all the information corresponding to him.
This brings great inconvenience for querying right. So in Huawei, we made a secondary index that associates the column to be searched with row key into an index table. To be more specific, we made the column of names as the index and the row key being the value. So we inverted the key-value pairs and then we can search the row key with names.
Does that solve the problem we mentioned before? Such an easy idea makes great improvement how smart it is. And now the column becomes the key and the original row ket becomes the value, we use the new key to find the row key then use the row key to find the other information, so we only need to query twice now.
Common HBase Shell Commands
create: creating Hive tables.
list: listing all tables in HBase.
put: adding data to a specified cell in a table, row, or column.
scan: browsing information about a table.
get: obtaining the value of a cell based on the table name, row, column, timestamp, time range, and version number.
enable/disable: enabling or disabling a table.
drop: deleting a table.
Summary of HBase-related posts
That's all, thanks!