Got it

[ Technical Dry Goods ] Hive MetaStore meta-database analysis

Latest reply: Mar 30, 2021 02:15:23 94 1 1 0 0

Hello, everyone!

Today I'm going to introduce you DAYU. 


One: Metadata& Metastore

Metadata is metadata. Metadata includes metadata such as database, table, and table fields created with Hive. Metadata is stored in a relational database. Such as the built-in Derby of hive, third parties such as MySQL, etc.

Metastore is a metadata service, and its role is: the client connects to the metastore service, and the metastore connects to the MySQL database to access metadata. With the metastore service, multiple clients can connect at the same time, and these clients do not need to know the username and password of the MySQL database, only need to connect to the metastore service.

HiveMetaStore : is a metadata service that provides metadata for Apache hive. It belongs to the Apache hive open source project. It can currently provide services as Standalone and is not limited to Hive. Third-party services can also use it as a metadata database service.

Two: Schema

Hive MetaStore as a meta-database, its Schema basically follows the Schema standard of the database system, and its Schema hierarchical relationship is: Catalog->Database->[Schema->]Table->Partition->Field, and the Schema layer has not yet been formalized. Use it, at least it is useless in Hive, but for Catalog currently only the default Catalog is used, and there is no related syntax definition in Hive Sql syntax, but it is not limited to Hive as a meta-database, for other systems The Schema can be fully utilized, and the data catalog is currently being called. What is a data catalog? Catalog is a data catalog.

In addition, the previous version of Hive 3.x supports Index function, but the 3.x version started to disable Index function, so although the 3.x version of Hive MetaStore Schema has the definition of Index table structure, it does not provide the relevant API.

Three: Architecture

o

among them:

1. Hive MetaStore uses Thrift as RPC Server to provide external services

2. Hive MetaStore uses DataNucleus (a JDO middleware) as database middleware to interact with the database

3. Hive MetaStore currently supports a variety of relational databases as the final metadata storage: berby, mssql, mysql, oracle, postgres

4. ThreadLocal is used extensively in the Hive MetaStore code to ensure that the resources of each thread are exclusive and do not affect each other

5. Hive MetaStore is developing a metadata caching mechanism to improve efficiency and avoid the need to interact with the database every time

6. Since Hive MetaStore is stateless (non-local cache), it supports distributed multi-instance deployment, but the local cache provided by itself does not support distributed, so in the case of distributed multi-instance deployment, in order to improve efficiency, Huawei's internal team has introduced Redis as a distributed cache, so that a caching mechanism can be introduced in distributed multi-instance deployments

7. The metadata schema designed by Hive MetaStore has a high level and depth. Therefore, for objects like tables, if you want to obtain table details, the frequency of internal interaction with the database is very high, which has an impact on performance to a certain extent.

8. Hive MetaStore provides tool classes to initialize and upgrade the metadata database

9. Hive MetaStore is still based on HDFS, so it will not work properly without HDFS

10. Hive MetaStore currently has two operating modes: independent Standalone mode + local mode embedded in HiveServer

11. Both the client and the server of Hive MetaStore use Proxy to implement a retry mechanism through the reflection mechanism.



well done
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.