Symptom
The Spark task takes a long time to calculate daily SDRs. Due to the increase in the number of components in the cluster capacity expansion, the mapreduce.input.fileinputformat.list-status.num-threads optimization parameters in the product do not improve the performance on the FI side. Therefore, the cause must be analyzed.
Solution
The getMoreSplits method does not have optimization parameters. The speed of traversing files is related to the number of files. Therefore, you are advised to reduce the number of files on the service side to accelerate the running.