hello, everyone!
this post i want to introduce you some Kafka secondary development Faults of " Kafka Service Exception due to Timeout in Connection to ZooKeeper".
[Applicable Versions]
6.5.x
[Symptom]
A user runs the Kafka command to view the current topic information. The query fails.
The following shows the message.
[root@***** kafka-client]# kafka-topics.sh --zookeeper 30.3.X.X:24002/kafka --list Exception in thread "main" org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 30000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:155) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:129) at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:90)at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71) at kafka.admin.TopicCommand$.main(TopicCommand.scala:53) at kafka.admin.TopicCommand.main(TopicCommand.scala)..........
The exception persists after changes the zookeeper.session.timeout.ms value from 30000 (ms) to 80000 (ms), which means the change does not take effect.
[Fault Locating]
According to an analysis:
The zookeeper.session.timeout.ms value remains unchanged because zookeeper.session.timeout.ms is hard-coded in Kafka. HBase has the similar issue.
Connection to ZooKeeper takes more than 30s possibly because of the DNS service.
To further prove the connection timeout issue, a small program is provided to obtain the information about a Znode on ZooKeeper. An exception is reported, as shown in the following example. In the example, the connection takes approximately 40s (40158 - 459 = 39699 ms), which proves that the possible cause is the DNS service.
459[Thread-0] INFOorg.apache.zookeeper.Login- TGT refresh sleeping until: Thu Oct 27 14:24:39 CST 201640518 [main-SendThread(**.*.***.**:24002)] INFOorg.apache.zookeeper.ClientCnxn- Opening socket connection to server **.*.***.**/**.*.***.**:24002. Will attempt to SASL-authenticate using Login Context section 'Client'40527 [main-SendThread(**.*.***.**:24002)] INFOorg.apache.zookeeper.ClientCnxn- Socket connection established, initiating session, client: /**.*.***.**:55380, server: **.*.***.**/**.*.***.**:2400240536 [main-SendThread(**.*.***.**:24002)] INFOorg.apache.zookeeper.ClientCnxn- Session establishment complete on server **.*.***.**/**.*.***.**:24002, sessionid = 0x1800915ac8954a3a, negotiated timeout = 4000zk root dir:[hive, hadoop-flag, storm, fgc, zookeeper, yarn-leader-election, hadoop-ha, loader, oozie, thriftserver, services, hwbackup, hbaseindexer, hiveserver2, solr, kafka, rmstore, hbase]
[Solution]
Provide section "ZooKeeper Service Abnormal After DNS Installation" of the Fault Management to enable the customer to help the service team locate the cause, which lies in the DNS service installed on the client.