Problem Information
Item | Description |
Storage type | Distributed storage |
Product version | FusionStorage 8.0.0 and later |
Problem type | Value-added feature |
Keyword | Remote replication interruption; Remote device disconnection |
Problem Symptom
An alarm is generated indicating that a remote replication pair or consistency group is disconnected. The remote replication pair or consistency group is in FAULTY status and the running status is To be synchronized.
Problem Diagnosis
On DeviceManager, choose Monitor > Alarms and Events > Alarms and check whether an alarm is generated indicating that the remote device is disconnected.

If yes, go to the next step.
If no, the fault is caused by other issues. This document is not applicable.
Locate the active replication node, run the diagnose_usr --set-cli command on any replication node, and run the ls command to check lsid of the dms process. In the following example, the lsid value is 125.

Run the rsf showcls command to check the IP address of the active node.

Search for ngcDevHeartbeatTimer in the /var/log/dr/rep/{bak,run} directory of the active node. Check whether loss heartbeat is printed.
zgrep -a ngcDevHeartbeatTimer /var/log/dr/rep/{bak,run}/*
If yes, the replication link heartbeat is lost due to a network fault. Go to Solution.
If no, go to the next step.
Search for logs on all replication nodes and check whether kill process is printed.
zgrep -a "lost cm connect" /var/log/dr/rep/{bak,run}/*If yes, the replication process exits due to heartbeat loss between the cm client and cm server. Go to Solution.
If no, go to the next step.
Run the following command on each replication node to check whether the drop rules exist:
iptables -L | grep 12100
![]()
If yes, the replication link is disconnected due to the firewall. Go to Solution.
If no, the fault is caused by other issues. This document is not applicable.
Causes
The replication link between clusters is disconnected, causing the disconnection of remote replication pairs or consistency groups.
Solution
If scenarios in 2 and 3 in Problem Diagnosis occur, rectify the network faults. After rectification, wait for 2 minutes and then check whether the alarm indicating remote device interruption is cleared.
If yes, no further action is required.
If no, go to 3.
If the scenario in 4 in Problem Diagnosis occurs, run a command on a node where drop rules exist to clear the drop rules of the firewall. For example:
iptables -D OUTPUT -p tcp --dport 12100 -j DROP
After clearing on all nodes, wait for 2 minutes and then check whether the alarm indicating remote device interruption is cleared.
If yes, no further action is required.
If no, go to 3.
Contact technical support engineers.
Check After Recovery
On DeviceManager, check that the status of remote replication pairs or consistency groups is Synchronizing or Normal and host services are running properly.
Suggestion and Summary
N/A
Applicable Versions
All


