Hello, everyone!
Today I'm going to introduce you DAYU.
One: Metadata& Metastore
Metadata is metadata. Metadata includes metadata such as database, table, and table fields created with Hive. Metadata is stored in a relational database. Such as the built-in Derby of hive, third parties such as MySQL, etc.
Metastore is a metadata service, and its role is: the client connects to the metastore service, and the metastore connects to the MySQL database to access metadata. With the metastore service, multiple clients can connect at the same time, and these clients do not need to know the username and password of the MySQL database, only need to connect to the metastore service.
HiveMetaStore : is a metadata service that provides metadata for Apache hive. It belongs to the Apache hive open source project. It can currently provide services as Standalone and is not limited to Hive. Third-party services can also use it as a metadata database service.
Two: Schema
Hive MetaStore as a meta-database, its Schema basically follows the Schema standard of the database system, and its Schema hierarchical relationship is: Catalog->Database->[Schema->]Table->Partition->Field, and the Schema layer has not yet been formalized. Use it, at least it is useless in Hive, but for Catalog currently only the default Catalog is used, and there is no related syntax definition in Hive Sql syntax, but it is not limited to Hive as a meta-database, for other systems The Schema can be fully utilized, and the data catalog is currently being called. What is a data catalog? Catalog is a data catalog.
In addition, the previous version of Hive 3.x supports Index function, but the 3.x version started to disable Index function, so although the 3.x version of Hive MetaStore Schema has the definition of Index table structure, it does not provide the relevant API.
Three: Architecture

among them:
1. Hive MetaStore uses Thrift as RPC Server to provide external services
2. Hive MetaStore uses DataNucleus (a JDO middleware) as database middleware to interact with the database
3. Hive MetaStore currently supports a variety of relational databases as the final metadata storage: berby, mssql, mysql, oracle, postgres
4. ThreadLocal is used extensively in the Hive MetaStore code to ensure that the resources of each thread are exclusive and do not affect each other
5. Hive MetaStore is developing a metadata caching mechanism to improve efficiency and avoid the need to interact with the database every time
6. Since Hive MetaStore is stateless (non-local cache), it supports distributed multi-instance deployment, but the local cache provided by itself does not support distributed, so in the case of distributed multi-instance deployment, in order to improve efficiency, Huawei's internal team has introduced Redis as a distributed cache, so that a caching mechanism can be introduced in distributed multi-instance deployments
7. The metadata schema designed by Hive MetaStore has a high level and depth. Therefore, for objects like tables, if you want to obtain table details, the frequency of internal interaction with the database is very high, which has an impact on performance to a certain extent.
8. Hive MetaStore provides tool classes to initialize and upgrade the metadata database
9. Hive MetaStore is still based on HDFS, so it will not work properly without HDFS
10. Hive MetaStore currently has two operating modes: independent Standalone mode + local mode embedded in HiveServer
11. Both the client and the server of Hive MetaStore use Proxy to implement a retry mechanism through the reflection mechanism.
