Hello, everyone!
Today I'm going to introduce you DAYU
For the Data Lake Exploration (DLI) service, users use SQL to analyze and process data most of the day, but sometimes the processing logic is particularly complex and cannot be processed by SQL. You can write Spark jobs for analysis and processing. This article describes how to submit a Spark job on the Service Lake Factory (DLF) through an example.
Create a DLI cluster
before running Spark jobs, you need to create a DLI cluster under the DLI service. The DLI cluster provides the physical resources required to run Spark jobs.
Note: 1CU of queue capacity is equivalent to 16GB with 4 cores.

Get the Spark job code
example The Spark job code demonstrated is from the Spark example https://github.com/apache/spark/blob/branch-2.1/examples/src/main/scala/org/apache/spark/examples/SparkPi. Scala , the logic of the example is to calculate the approximate value of π. You can download the entire Spark code and compile the jar package yourself, or you can download the jar package directly from the maven library at: http://repo.maven.apache.org/maven2/org/apache/spark/spark-examples_2.10 /1.1.1/spark-examples_2.10-1.1.1.jar .
After having the jar package, upload the jar package to the obs bucket. Then create a resource associated with spark-examples_2.10-1.1.1.jar on DLF.
Create a DLF job
Create a DLF job, which contains a DLI Spark node, and fill in the DLI Spark node parameters.
[Parameter description]
Job running resources: When the DLI Spark node is running, the maximum CPU and memory resources that can be used are limited (the maximum CPU resources that can be used are also subject to the queue capacity of the DLI cluster). Reference: https://support.huaweicloud.com/api-uquery/uquery_02_0114.html#uquery_02_0114__zh-cn_topic_0103343292_zh-cn_topic_0102902454_table1656812183429
The main class of the job: The main class of the DLI Spark job, in this example is org.apache.spark.examples.SparkPi.
Jar package parameter: the entry parameter of the main class of the DLI Spark job. If not, leave it blank.
Spark job running parameter: The startup parameter of the Spark job. This parameter is currently useless and will be removed later.
Log path: The OBS path where the running log of the DLF job is stored.
Run the DLF job
after the job is scheduled, right-click DLI Spark, and you can see the node running log from the view log.
