Hi team, here's a new case.
Symptom
When a user attempts to add a storage node, the system does not respond for a long time and alarm Failed to connect to the network of the node to be added is reported.
Diagnosis
1. Check whether the information entered by the user, especially the IP address, is correct.
2. View log /var/log/deploy/console/run/servicetool_console.log to check whether the following exception information is displayed: com.jcraft.jsch.JSchException:channel is not openned and preCheck result is – 1.
3. Log in to the primary management node and use SSH to log in to the target IP address. Check whether the login is suspended for a long time.
4. View log /var/log/deploy/console/run/servicetool_console.log and check whether keywords ping xxx from xxx result is x and The network connection to xxx fails are printed.
Causes
The possible causes are as follows:
1. The solid-state hybrid drive (SSHD) on the server accesses the DNS to query the host name of the target IP address. If the DNS is unavailable or no related record is available, a large amount of time is consumed.
2. Performing authentication gssapi-with-mic also takes time.
3. Run the ssh -v host command to check the cause of the slow operation and run the time ssh root@host exit command to check the connection time.
4. If the log contains keywords ping and network connections fails, a common user has no permission on executing the ping command.
Solution
There are various causes for the slow SSH connection.
You are advised to follow the settings below one by one, but you are not advised to modify all of them. After the modification, run the systemctl restart sshd.service command to restart the SSHD service.
Configure management nodes.
Disable the DNS reverse resolution.
In Linux, the DNS reverse resolution for SSH is enabled by default, which takes a long time. You are advised to disable the DNS reverse resolution.
# vi /etc/ssh/sshd_config
UseDNS no
Disable GSS authentication.
Problems are likely to occur when you perform gssapi-with-mic authentication. The SSH connection rate may be increased when the GSS authentication is disabled.
# vi /etc/ssh/sshd_config
GSSAPIAuthentication no
Modify the nsswitch.conf file.
The sequence of domain name resolution for the accessed host is modified. The original configuration hosts: files dns indicates that the /etc/hosts file is accessed at first. If the domain name is not recorded in the hosts file, the DNS is accessed for domain name resolution. If the DNS cannot be accessed, the response is returned after the access times out. Therefore, the waiting time is long.
# vi /etc/nsswitch.conf
Find and change
hosts: files dns
to
hosts: files
Note: If the management node needs to access other servers by using the domain name, the domain name must not be changed.
Modify the resolv.conf file.
Delete all unused IP addresses from the /etc/resolv.conf file. If the management node has been configured with two network adapters, delete the IP addresses that are not in use.
Modify the hosts file,
Add the IP address and hostname of the target node to the /etc/hosts file on the management node.
Enable the IgnoreRhosts parameter.
The IgnoreRhosts parameter can be used to ignore the records of the hosts that have been logged in to before. If the parameter is set to yes, the connection speed can be greatly increased.
# vi /etc/ssh/sshd_config
IgnoreRhosts yes
Configure the target node.
Modify the hosts file,
Add the IP addresses and domain names of all management nodes to the /etc/hosts file of the target node so that the DNS services on the local host can resolve the target address.
# vi /etc/hosts
192.168.100.11 storagefsm1.com
192.168.100.12 storagefsm2.com
Modify the ssh_config file.
Problems are likely to occur when you perform gssapi-with-mic authentication. The SSH connection rate may be increased when the GSS authentication is disabled.
# vi /etc/ssh/ssh_config
GSSAPIAuthentication no
The user has no permission on executing the ping command.
Run the following command to grant the permission on executing the ping command to the user.
chmod u+s /bin/ping
Check After Recovery
After the modification, add the target node again for deployment.
The node can be successfully added when other compliance check conditions are met.
Suggestion and Summary
Due to Linux performance or configuration problems, program problems may occur.
Therefore, when checking program problems, you also need to consider other factors such as the environment.