Got it

Guide to Clearing and Restoring Spark Jobhistory Records?

Created: Jul 25, 2019 06:35:55Latest reply: Jul 25, 2019 06:38:32 400 1 0 0 1
  Rewarded HiCoins: 0 (problem resolved)

Guide to Clearing and Restoring Spark Jobhistory Records?

Symptom

  • The Spark application failed to be submitted. The driver log shows that the /sparkJobHistory directory of the Spark application exceeds the threshold. The exception is as follows:

    The directory item limit of /sparkJobHistory is exceeded: limit=1048576 items=1048576

  • Check the /sparkJobHistory directory of HDFS. You can find that the number of files reaches hundreds of thousands. The value of the duration from the time when the residual files are generated to the time when this problem occurs is much larger than the value of the spark.history.fs.cleaner.maxAge parameter (7 days by default) for reserving log files configured in the Spark service, indicating that the deletion thread is not started.

Featured Answers
tanglei
Created Jul 25, 2019 06:38:32

Hey there!

Fault handling

1. Manually delete the eventlog file to ensure that the service is available.

2. Increase the value of spark.eventLog.group.size to reduce the frequency of generating a file.

3. Increase the value of the JobHistory memory parameter SPARK_DAEMON_MEMORY to 10 GB or larger and decrease the value of spark.history.fs.cleaner.maxAge to reduce the number of reserved files.

Operation guide

Check whether all running Spark services can be stopped.

  • If yes, go to Solution 1. (In this step, the JobHistory directory will be deleted. As a result, historical information about all Spark tasks will be lost and Spark tasks that are running may fail).

  • No, go to Solution 2. (This step does not affect ongoing tasks. However, you must ensure that new Spark tasks are not submitted in the processing phase. Manual operations are complex and the workload is heavy. You need to contact R&D engineers).

Solution 1

1. On the Manager page, go to the Spark service page and stop the JobHistory instance.

http://10.88.194.32:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633347.jpg

Go to the Spark service configuration page and change the value of SPARK_DAEMON_MEMORY under JobHistory to 10 GB. (This step is mandatory for preventing the JobHistory thread from clearing the eventlog thread running failure due to insufficient memory).

Change the value of spark.history.fs.cleaner.maxAge to 4d. (This step is optional, indicating the maximum duration for storing logs on JobHistory. A smaller value indicates fewer files stored in the JobHistory directory. Save the configuration.

http://10.88.194.32:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633371.jpg
http://10.88.69.41:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633348.jpg

2. Log in to the client, run the kinit admin command to authenticate the user admin. Run the following command to remove the /sparkJobHistory directory and back it up:

hdfs dfs -mv /sparkJobHistory/sparkJobHistory-bak

3. Re-create the JobHistory folder and modify the permission.

hdfs dfs -mkdir /sparkJobHistory
hdfs dfs -chown spark:hadoop/sparkJobHistory
hdfs dfs -chmod 777/sparkJobHistory

4. Start the JobHistory instance.

http://10.88.69.41:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633350.jpg

5. After the JobHistory is started, delete the JobHistory directory that is backed up and run the following command on the client:

hdfs dfs -rm -r /sparkJobHistory-bak

Set the client parameter spark.eventLog.group.size of the Sparkstreaming service (the default value is 30 in $client_home/Spark/spark/conf/spark-defaults.conf). Change 30 to 3000. Do not change the value of spark.eventLog.group.size for non-Sparkstreaming jobs. 

After the parameter value has changed, submit the service again.
View more
  • x
  • convention:

All Answers
Hey there!

Fault handling

1. Manually delete the eventlog file to ensure that the service is available.

2. Increase the value of spark.eventLog.group.size to reduce the frequency of generating a file.

3. Increase the value of the JobHistory memory parameter SPARK_DAEMON_MEMORY to 10 GB or larger and decrease the value of spark.history.fs.cleaner.maxAge to reduce the number of reserved files.

Operation guide

Check whether all running Spark services can be stopped.

  • If yes, go to Solution 1. (In this step, the JobHistory directory will be deleted. As a result, historical information about all Spark tasks will be lost and Spark tasks that are running may fail).

  • No, go to Solution 2. (This step does not affect ongoing tasks. However, you must ensure that new Spark tasks are not submitted in the processing phase. Manual operations are complex and the workload is heavy. You need to contact R&D engineers).

Solution 1

1. On the Manager page, go to the Spark service page and stop the JobHistory instance.

http://10.88.194.32:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633347.jpg

Go to the Spark service configuration page and change the value of SPARK_DAEMON_MEMORY under JobHistory to 10 GB. (This step is mandatory for preventing the JobHistory thread from clearing the eventlog thread running failure due to insufficient memory).

Change the value of spark.history.fs.cleaner.maxAge to 4d. (This step is optional, indicating the maximum duration for storing logs on JobHistory. A smaller value indicates fewer files stored in the JobHistory directory. Save the configuration.

http://10.88.194.32:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633371.jpg
http://10.88.69.41:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633348.jpg

2. Log in to the client, run the kinit admin command to authenticate the user admin. Run the following command to remove the /sparkJobHistory directory and back it up:

hdfs dfs -mv /sparkJobHistory/sparkJobHistory-bak

3. Re-create the JobHistory folder and modify the permission.

hdfs dfs -mkdir /sparkJobHistory
hdfs dfs -chown spark:hadoop/sparkJobHistory
hdfs dfs -chmod 777/sparkJobHistory

4. Start the JobHistory instance.

http://10.88.69.41:7088/idp-edit-service/editor/image/11108756775/A-1_1_en-us_image_0118633350.jpg

5. After the JobHistory is started, delete the JobHistory directory that is backed up and run the following command on the client:

hdfs dfs -rm -r /sparkJobHistory-bak

Set the client parameter spark.eventLog.group.size of the Sparkstreaming service (the default value is 30 in $client_home/Spark/spark/conf/spark-defaults.conf). Change 30 to 3000. Do not change the value of spark.eventLog.group.size for non-Sparkstreaming jobs. 

After the parameter value has changed, submit the service again.
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.