As mentioned above, a flow table such as a vehicle traffic record table stores a huge number of records, and the number will quickly increase as time
goes by. Therefore, you need to create these tables as partitioned tables by date, and
application programs must use partition conditions as much as possible when scanning the
tables.
The following SQL statement is submitted by an application program to the database to
display tables and graphs on the homepage when a user logs in to the XX video big data
system:
SELECT count(1) FROM sc_etcptjd WHERE lower(sjly) = 'p';
Note that the WHERE clause does not use the distribution key enter_time as a filter
condition. In this case, each time a user logs in to the system, the application program fully
scans the flow table storing 1 billion vehicle traffic records, costing high I/O resources.
In such a flow table, the inventory data is constant, and only the incremental vehicle traffic
records of the day keep changing. The system only needs to calculate the number of inventory
records of the previous day at midnight 12:00 every day and saves it in a configuration table.
When a user later logs in to the system, the system only measures the number of incremental
records in the partition of the day in real time, and adds the number to the number of
inventory records in the configuration table to obtain the final result. The modified SQL
statement is as follows:
SELECT count(1) FROM sc_etcptjd WHERE lower(sjly) = 'p'
AND enter_time = CURRENT_DATE;
Adding a partition filter condition in the WHERE clause avoids full scan on a flow table in
peak service hours, which greatly improves user experience in homepage login.