Got it

[ Technical Dry Goods ] How to develop a Spark job on DLF

148 0 1 0 0

Hello, everyone!

Today I'm going to introduce you DAYU


For the Data Lake Exploration (DLI) service, users use SQL to analyze and process data most of the day, but sometimes the processing logic is particularly complex and cannot be processed by SQL. You can write Spark jobs for analysis and processing. This article describes how to submit a Spark job on the Service Lake Factory (DLF) through an example.


Create a DLI cluster
before running Spark jobs, you need to create a DLI cluster under the DLI service. The DLI cluster provides the physical resources required to run Spark jobs.
Note: 1CU of queue capacity is equivalent to 16GB with 4 cores.

DLI cluster.JPG

Get the Spark job code
example The Spark job code demonstrated is from the Spark example https://github.com/apache/spark/blob/branch-2.1/examples/src/main/scala/org/apache/spark/examples/SparkPi. Scala , the logic of the example is to calculate the approximate value of π. You can download the entire Spark code and compile the jar package yourself, or you can download the jar package directly from the maven library at: http://repo.maven.apache.org/maven2/org/apache/spark/spark-examples_2.10 /1.1.1/spark-examples_2.10-1.1.1.jar .

After having the jar package, upload the jar package to the obs bucket. Then create a resource associated with spark-examples_2.10-1.1.1.jar on DLF.
Create resource.JPG 

Create a DLF job
Create a DLF job, which contains a DLI Spark node, and fill in the DLI Spark node parameters.

Spark job.JPG 
[Parameter description]
Job running resources: When the DLI Spark node is running, the maximum CPU and memory resources that can be used are limited (the maximum CPU resources that can be used are also subject to the queue capacity of the DLI cluster). Reference: https://support.huaweicloud.com/api-uquery/uquery_02_0114.html#uquery_02_0114__zh-cn_topic_0103343292_zh-cn_topic_0102902454_table1656812183429
The main class of the job: The main class of the DLI Spark job, in this example is org.apache.spark.examples.SparkPi.

Jar package parameter: the entry parameter of the main class of the DLI Spark job. If not, leave it blank.
Spark job running parameter: The startup parameter of the Spark job. This parameter is currently useless and will be removed later.
Log path: The OBS path where the running log of the DLF job is stored.

Run the DLF job
after the job is scheduled, right-click DLI Spark, and you can see the node running log from the view log.

Node log.JPG


Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.