Got it

case share

301 0 0 0 0

1 Basic Information

SR ticket NO.

 

Accident Description

Customer reported that when he do the inspection for the Oceanstor 9000 and find one of the cluster node snas_rep process have coredump file

Accident Time

2019/03/20

Product

OceanStor 9000 V300R006C20SPC200

 

 


2 Problem Description

Customer reported that when he do the inspection for the Oceanstor 9000 and find one of the cluster node snas_rep process have coredump file. Need check the root cause.

 


3 Problem Analysis

The node backend IP (10.68.10.17) where the coredump file appears is found through the inspection report.

194920r7d9na2q57ae95zn.png

Login the node with putty and unzip the coredump file and find that the file is small (330MB) before decompression, but very large (more than 100GB) after decompression. It is suspected that the coredump is caused by a thread leak known to the version.

194920kmuf98gjuhbggkxf.png

The "rnm shownodeinfo" command is used to query the remote replication master node ID of the cluster to 10, and jump to the node (the backend IP corresponding to node 10 can be obtained through cat /proc/monc_nodemap, and the IP jump is obtained). The number of threads occupied by the snas_rep process on the remote replication master node is queried by ps –eLF | grep snas_rep | wc -l. It is found that more than 3000, the thread leakage problem is confirmed.

 

194920rxl0oazvnonr4na0.png

 

 

 

 


4 Root Cause

The thread pool query replication task internal database is generated during each remote replication pair synchronization process, and the thread pool has a mechanism for not releasing the automatic release resource for 15 minutes (the usage time is updated every time);

 

In the current network version, some threads are not processed normally when the remote replication database thread pool is automatically released, resulting in 10 leaks in a single thread. The long-running operation causes the remote replication process to have too much virtual memory. When the thread is created, the virtual memory fails to be allocated.

 

When the remote replication database thread pool is initialized, the thread creation fails. Going into the error branch causes the thread pool to be processed abnormally, causing the snas_rep process to have a coredump.

 


 

 

5 Solutions

The problem have solved with the patch version V300R006C20SPC300.

 

Hot patch download link:

https://support.huawei.com/enterprise/en/software/23643152-SW2000090781

 

 

 

SummaryChecking the details it will fixed with the hot patch and it will takes about 30mins. No impact on production business.

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.