Got it

Invalid argument faults occured in OceanStor 9000 when run the cmd "ls" about a directory mounted by NFSv3

Latest reply: Apr 1, 2017 09:19:49 1334 1 0 0 0
While using the Linux command ‘ls’ to print an specific directory which is mounted via NFSv3 protocol and the front IP address on the node4, it will receive no response for about 2 minutes and will print ‘reading directory:. Invalid argument’ on the screen.

1  According to the packets captured at the client side, the Linux command “ls” doesn’t receive any responses and tries to retransmit the request.

http://support.huawei.com/enterprise/product/images/d8d23b76bd46486e9231c524a1377afe

2  With the further analysis from the logs, we find that the unlock message OPEN_LOCK produced by the Linux command ‘ls’ is not replied by the lock server that will decide which node to do the unlock operation, so that the client receives no response. When open a file or directory, OPEN_LOCK will be used.

http://support.huawei.com/enterprise/product/images/8da1961d105e4e58a14026a87fe7057a

3  In accordance with the logs in the protocol module, when the number of re-transmissions received by the OceanStor 9000 has reached the limit of timeout, the OceanStor 9000 will reply the NFS3ERR_INVAL back to the client. It means invalid argument or unsupported argument for an operation, which is caused by the timeout.

http://support.huawei.com/enterprise/product/images/4a25a6f15b51457394d5270f64ac2834

4  With the analysis of the logs of the lock module, we find that there is a RPC link problem from the node4 to node4, which leads to no response to unlock message. When one node starts, it will create RPC links from this node to all nodes to transfer the unlock command. All nodes are lock server. Concretely, when a client runs the command “ls” the S002 directory by IPs of node4, node4 will get lock from a lock server in order to access the directory. But the lock server is just node4.

 snasmessages_1

In summary, the issue is caused by the problem of the RPC link from node4 to node4. When the Linux command ‘ls’ tries to access the certain directory in the NFSv3 share, the system decides the RPC link from node4 to node4 will be used which is faulty. Besides, the reason why NFSv4 goes well is because NFSv4 process doesn’t use the lock module.

      

http://support.huawei.com/enterprise/pages/main/images/transparent.gifRoot Cause

The issue was caused by problematic RPC link that is used to get lock. So that when "ls" a directory, it can't get the lock from a lock server. It’s a very low probability bug to create a problematic RPC link.

http://support.huawei.com/enterprise/pages/main/images/transparent.gifSolution

[Workaround]
Mount NFS shares by NFS4 or by frontend IPs of other nodes.

[Future solution]
The issue is resolved by the latest version V100R001C30SPC200. You can consider an upgrade after the version of software becomes the recommended version on Huawei website.

 


View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.