Got it

Producer Fails to Send Data, and No Exception Is Recorded in Server Logs

180 0 0 0 0

Hello, everyone! 

I will share with you how to deal with the problem producer fails to send data when using Kafka.

Context and Symptom

Error description on the client:

The error that Kafka failed to send data frequently occurred and could not be recovered for a long time. After Kafka was restarted, services were recovered. The following exception was displayed on the client:

can not send messages!

Error description on the server:

The engineer logged in to FusionInsight Manager, chose Services > Kafka, and checked the values of Total Number of Partitions That Are Not Fully Synchronized and Total Request Rate.

If the two parameters were not displayed on the right, the engineer could click Customize to select the two items and click OK.

As shown in the above figure, the number of suddenly-increased Partitions was not synchronized, which decreases the request rate. Therefore, the server was abnormal. In this case, the following steps for troubleshooting needed to be performed.

Information Collection

Finding the jstack of the Kafka fault process by running required commands

  1. Run the following command to query the ID of the Broker process.

    ps -ef|grep kafka

    1_en-us_image_0228243543.png

  2. Run the following command to log in to all background nodes of Kafka:

    su - omm

    jstack Kafka process ID >> /tmp/kafka.jstack

    It is recommended that the jstack information be recorded every 10s, and three jstack records are required, then collected kafka.jstack files.

    1_en-us_image_0228243544.png

Changing the log level of Kafka to DEBUG and keeping the configuration for five minutes

  1. Log in to FusionInsight Manager, choose Services > Kafka > Service Configuration, and set Type to All.

    Enter the name of the parameter to be modified in the search box on the right.

  2. Modify the log level of the following logs to DEBUG, namely, kafka.log.levelkafka.network.requestchannel.log.levelkafka.request.log.level, and root.log.level.

  3. Click Save in the upper left corner, and click OK on the displayed dialog box.

  4. Wait for 5 minutes, change the value to INFO, and save the configuration.

  5. On FusionInsight Manager, choose System > Log Download.

    Set Services to kafka:broker as shown in the preceding figure.

    You did not need to configure the Hosts parameter, all hosts where Kafka was deployed are selected by default.

    Set Time to the time period from an hour before the exception occurs in the current time.

Obtaining detailed information about all partitions

  1. Run the following command to go to the client installation directory:

    cd Installation directory of the Kafka client

  2. Run the following command to configure environment variables:

    source bigdata_env

  3. If the cluster uses the security mode, run the following command to authenticate the user. In normal mode, user authentication is not required.

    kinit Component service user

  4. Run the following command on the Kafka client to collect partition-describe.log:

    kafka-topics.sh --describe --zookeeper 26.3.X.X:24002/kafka > /tmp/partition-describe.log

To sum up, you need to collect kafka.jstackpartition-describe.log and the Broker service logs.

Possible Causes

  1. The network is abnormal.

  2. The Broker service is faulty.

Cause Analysis

  1. After checking the network, and there was no exception found.

  2. No error was reported in Broker logs.

  3. When the jstack file had been checked, the following error information was displayed:

    The error information indicated that dead locks occurred on two request threads.

    1_en-us_image_0228243545.png

    1_en-us_image_0228243546.png

    This was an open source problem, and you could find the details by visiting https://issues.apache.org/jira/browse/KAFKA-6042There.

Solution

  1. Temporarily change the number of threads on the server to 1 to prevent deadlock between multiple threads. However, this operation affects the performance of the server.

    Log in to FusionInsight Manager, choose Services > Kafka > Service Configuration > Type > All, enter num.io.threads in the search box in the upper right corner, and change the value to 1.

    After the modification, click Save Configuration, and click OK. (Note: Do not select the service or instance that affects the cluster restart. It is recommended that you manually restart the Kafka cluster after negotiation with the customer. The modification takes effect only after the restart.)

    To manually restart the service, log in to FusionInsight Manager and choose Services > Kafka > More Actions > Restart Service.

  2. This problem has been resolved using the FusionInsight Tool V100R002C80SPC002 patch. You are advised to start production services after installing the formal patch.


We warmly welcome you to enjoy our community!

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.