Hello, everyone!
This post will share with you some cases about TCP hang, hope you like this post.
[DBService] injection TCP hang scenario without inversionVerification environment
167.52.0.61-63;
Main DBService node: 167.52.0.62
Standby DBService node: 167.52.0.61
Verification process
Scenario 1: Construct the TCP sending and receiving death scenario of physical IP 167.52.0.62 of the main DB node
Phenomenon
A. xxx.xx.0.62 IP can ping, but can not log in through ssh;
B. DB floatip can ping the same, but also SSH login; after DB floatip login to the main DB node, construct JDBC/gsql local access gaussdb successfully.
C. In standby node xxx.xx.0.61, the GSQL command is executed to access the main DB node remotely, and the gaussdb can be accessed successfully.
D. In standby node xxx.xx.0.61, remote access to gaussddb can be achieved through JDBC construction.
Scenario 2: Construct a floating IP TCP sending and receiving hang scenario for the primary DB node.
Phenomenon
A. xxx.xx.0.62 IP can be ping-connected or ssh-logged; after login to the main DB node, the local access to gaussdb by JDBC/gsql is successfully constructed.
B. DB floatip can ping, but can not log in through ssh;
C. Executing GSQL commands on primary/standby nodes to remotely access primary DB nodes, not accessible
D. Gassddb can not be accessed remotely through JDBC construction in main/spare section.
Result analysis
1. In Scenario 1, DBService can work normally, and both local and remote access functions of database are normal.
2. In scenario 2, DBService can work normally locally, but the remote access function is abnormal, and other nodes can not successfully access the main DB through JDBC or other means.
Program discussion
For scenario 2, the following scenarios can be used:
1. Add common user rights to DBService, and use this user to enable primary and standby DB nodes to access Gaussdb via floatip (the remote access function of OMM users has been disabled in the current version due to the security red line requirement)
2. When HA checks the status of Gaussdb process in the main DB node, add a JDBC/gsql to access the database remotely through floatoip to check whether floatip can provide normal database access.
3. If the remote access fails, the resource exception is returned, and the HA arbitration is used for the master-backup conversion.
Final conclusion
And network experts confirm that if the Ping packet is not lost (a small number of lost packets), there will be a large number of lost or dead TCP protocol layers, usually caused by the protocol layer or more upper layer, such as coding bugs, so the following scenarios have a low probability.
That's all, thanks!
