Hello, everyone!
This post describes Data Fails to Be Written Occasionally Due to Insufficient Random Ports.
[Applicable Versions]
6.5.x
[Symptom]
solr data is stored in the HDFS. Index data fails to be written occasionally.
[Fault Locating]
Check the solr log on the server. An error message is displayed indicating that the HDFS file does not exist.
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-223864209-128.X.X.206-1488793177181:blk_1088343569_14603067 file=/user/solr/SolrServer1/hbaseCol_120G_All/core_node4/data/index/_gm_Lucene50_0.timat org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1011)at org.apache.hadoop..hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1123)at org.apache.hadoop.hdfs.DFSInputStream..pread(DFSInputStream.java:1450)at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:107)at org.apache.solr.store..hdfs.HdfsDirectory$HdfsIndexInput.readInternal(HdfsDirectory.java:214)at org.apache.solr.store.blockcache..CustomBufferedIndexInput.refill(CustomBufferedIndexInput.java:192)at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:94)at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readBytes(CustomBufferedIndexInput.java:68)at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readIntoCacheAndResult(BlockDirectory.java:208)at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.fetchBlock(BlockDirectory.java:195)at org.apache.solr.store.blockcache.BlockDirectory$CachedIndexInput.readInternal(BlockDirectory.java:179)at org.apache.solr.store.blockcache.CustomBufferedIndexInput.refill(CustomBufferedIndexInput.java:192)at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readByte(CustomBufferedIndexInput.java:46)at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)at org.apache.solr.store.blockcache.CustomBufferedIndexInput.readVInt(CustomBufferedIndexInput.java:161)at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:157)at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next(SegmentTermsEnum..java:962)at org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:275)at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:301)at org.apache.lucene.index..FilterLeafReader$FilterTermsEnum.next(FilterLeafReader.java:195)at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:438)at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198)at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)at org.apache.lucene.index..SegmentMerger.mergeTerms(SegmentMerger.java:193)at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:95)at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4089)at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3664)at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)| org.apache.solr.common.SolrException.log(SolrException.java:150)
Check the datanode log of the HDFS. A large number of error messages are printed, indicating failure to allocating random connection ports.
2017-11-27 05:10,771 | WARN | commitScheduler-13-thread-1 | Connection failure: Failed to connect to /xxx.xxx.xxx.xxx:25009 for file /user/solr/SolrServer1/hbaseCol_120G_All/core_node4/data/index/_32h_Lucene50_0.tim for block BP-223864209-128.X.X.206-1488793177181:blk_1088390207_14649705:java.net.BindException: Cannot assign requested address | org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1215)java.net.BindException: Cannot assign requested address
The preceding log shows that solr fails to connect to datanode because the operating system does not have sufficient random ports. As a result, solr data fails to be saved to the database occasionally.
[Solution]
Add the following information to the /etc/sysctl.conf file on each datanode node:
net.ipv4.tcp_syncookies = 1net.ipv4.tcp_tw_reuse = 1net.ipv4.tcp_tw_recycle = 1net.ipv4.tcp_fin_timeout = 30net.ipv4.tcp_timestamps = 1
Run the sysctl -p command to make the modifications take effect.
Hope you can learn from it, thank you!