Hello, everyone!
Today I'm going to introduce you MRS HBase MOB features.
In practical applications, large and small data, such as image data and documents, need to be stored. Data smaller than 10MB can generally be stored on HBase. For data smaller than 100KB, the read and write performance of HBase is the best. If the data stored in HBase is larger than 100KB or even 10MB in size, insert the same number of data files, but the total amount of data will be large, which will cause frequent compaction and split, occupy a lot of CPU, disk IO frequency is high, and performance Severe decline.
By storing MOB (Medium-sized Objects) data (data from 100KB to 10MB in size) directly in HFile format on a file system (such as HDFS file system), these files are centrally managed through the expiredMobFileCleaner and Sweeter tools, and then these files The address information and size information of is stored as value in the store of ordinary HBase. This can greatly reduce HBase's compression and split frequency and improve performance.
HBase currently enables the MOB function by default. If you need to use the MOB function, users need to specify the use of mob to store data on the specified column family when creating a table or modifying table attributes
In order to enable the HBase MOB function, users need to specify the use of mob to store data on the specified column family when creating a table or modifying table attributes.
Use code to declare the way to use mob storage:
HColumnDescriptor hcd =
new
HColumnDescriptor(
"f"
);
hcd.setMobEnabled(
true
);
hbase(main):
009
:
0
> create
't3'
,{NAME =>
'd'
, MOB_THRESHOLD =>
'102400'
, IS_MOB =>
'true'
}
0
row(s) in
0.3450
seconds
=> Hbase::Table - t3
hbase(main):
010
:
0
> describe
't3'
Table t3 is ENABLED
t3
COLUMN FAMILIES DESCRIPTION
{NAME =>
'd'
, MOB_THRESHOLD =>
'102400'
, VERSIONS =>
'1'
, KEEP_DELETED_CELLS =>
'FALSE'
, DATA_BLOCK_ENCODING =>
'NONE'
,
TTL =>
'FOREVER'
, MIN_VERSIONS =>
'0'
, REPLICATION_SCOPE =>
'0'
, BLOOMFILTER =>
'ROW'
,
IN_MEMORY =>
'false'
, IS_MOB =>
'true'
, COMPRESSION =>
'NONE'
, BLOCKCACHE =>
'true'
, BLOCKSIZE =>
'65536'
}
1
row(s) in
0.0170
seconds
Use the shell to declare the way to use mob, the unit of MOB_THRESHOLD is byte:
Parameter entry
In the MRS Manager system, select "Cluster> Name of the cluster to be operated > Service> HBase> Configuration", and click "Configure all". Enter the parameter name in the search box.
parameter | description | Defaults |
---|---|---|
hbase.mob.file.cache.size | The number of open file handles in the cache. If the value is set relatively large, the cache can cache more file handles, thereby reducing the frequency of opening and closing files. But if this value is set too much, it will cause too many open file handles. The default value is: "1000" . This parameter is configured on the server ResionServer. | 1000 |
hbase.mob.cache.evict.period | The period for mob cache to reclaim cached mob files is 3600s by default. | 3600 |
hbase.mob.cache.evict.remain.ratio | The proportion of the number of files retained by the mob cache after recycling to the number of cache capacity. When the number of cached files exceeds the value set by hbase.mob.file.cache.size, mob cache recycling will be triggered. | 0.5 |
hbase.master.mob.ttl.cleaner.period | ExpiredMobFileCleanerChore execution cycle, in seconds. The default value is one day (86400 seconds). Description: If the time-to-live value expires, that is, the file has been more than 24 hours since it was created, the MOB file will be deleted by the expired mob file cleaning tool. | 86400 |
That's all, thanks!