Hello, everyone!
This post will share with you a case about Huawei FusionInsight hash join data is too large to rewrite.
Because the amount of data is too large, the original business is 5 billion and 5 billion two tables
to do full table join, hash join data is too large.
The original SQL is:
select T3.name as "region", Case t2.InsuredSex when 2 then 'female' else 'male' end as "gender", Count(*) as "quantity" From basicpolicy t1,insured t2,area_table t3 Where t1.PolicyNo=t2.PolicyNo And t3.id=t1.EffPrefectureCode Group by t3.name,t2.InsuredSex;
When the work_mem memory is set to 12GB, the data is downloaded, and the execution time is 3126085ms.
When the work_mem memory is set to 22GB, the data has no disk, and the execution time is 892271ms.
For this, analyze the business logic, modify the SQL, group by gender, and then slicing through union all.
After rewriting SQL:
select t3.name as "region", t4.sex as "gender", t4.num as "quantity" From area_table t3, (select t1.EffPrefectureCode, 'Female'::text as sex, Count(t1.PolicyNo) as num From basicpolicy t1,insured t2 Where t1.PolicyNo=t2.PolicyNo And t2.InsuredSex=2 Group by t1.EffPrefectureCode) t4 Where t3.id=t4.EffPrefectureCode Union all Select t3.name as "region", t4.sex as "gender", t4.num as "quantity" From area_table t3, (select t1.EffPrefectureCode, 'Male'::text as sex, Count(t1.PolicyNo) as num From basicpolicy t1,insured t2 Where t1.PolicyNo=t2.PolicyNo And t2.InsuredSex=2 Group by t1.EffPrefectureCode) t4 Where t3.id=t4.EffPrefectureCode;
Work_mem memory uses 15GB, data has no download, execution time: 498279ms.
That's all, thanks!
