Got it

How to clear ALM-316010198 Data Replication Failure in the OMMHA System?

Latest reply: Apr 30, 2021 04:12:01 190 1 0 0 0
  1. Check whether the alarm indicating that files fail to be replicated in an OMMHA two-node cluster is in the eSight alarm list. If yes, rectify the fault described in the alarm information.

  2. Check whether the network between the active and standby nodes is reachable. If the network is unreachable, rectify the network fault.



    1. Log in to the active eSight server as the ossuser user.

    2. Ensure that the standby server is running properly and the network is reachable. You can run the ping command to connect to the system IP address and heartbeat IP address of the standby server.

      ossuser@eSightServer:~> ping -c 4 10.137.63.225
      PING 10.137.63.225 (10.137.63.225) 56(84) bytes of data.
      64 bytes from 10.137.63.225: icmp_seq=1 ttl=64 time=0.477 ms
      64 bytes from 10.137.63.225: icmp_seq=2 ttl=64 time=0.439 ms
      64 bytes from 10.137.63.225: icmp_seq=3 ttl=64 time=0.437 ms
      64 bytes from 10.137.63.225: icmp_seq=4 ttl=64 time=0.384 ms
      
      --- 10.137.63.225 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 2999ms
      rtt min/avg/max/mdev = 0.384/0.434/0.477/0.036 ms
      ossuser@eSightServer:~> ping -c 4 192.168.122.1
      PING 192.168.122.1 (192.168.122.1) 56(84) bytes of data.
      64 bytes from 192.168.122.1: icmp_seq=1 ttl=64 time=0.018 ms
      64 bytes from 192.168.122.1: icmp_seq=2 ttl=64 time=0.016 ms
      64 bytes from 192.168.122.1: icmp_seq=3 ttl=64 time=0.015 ms
      64 bytes from 192.168.122.1: icmp_seq=4 ttl=64 time=0.014 ms
      
      --- 192.168.122.1 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 2998ms
      rtt min/avg/max/mdev = 0.014/0.015/0.018/0.005 ms
  3. To avoid the impact of intermittent network disconnection, observe the network for 10 minutes after the network connection is restored to check whether the alarm is cleared.



  • If yes, the environment has been recovered and no manual rectification is required.

  • If no, the environment is abnormal. You need to perform the following steps to manually rectify the fault.

Determine the correct host.


If the data replication failure alarm remains unhandled for a long time and you log in to eSight to perform O&M operations during this period, for example, adding a new device, data differences between the active and standby nodes cannot be ignored. If an active/standby switchover occurs, the original standby server functions as the new active server. In this case, the preceding O&M operation results are lost, for example, the added device is lost.

To prevent this problem, query service data based on run logs and select the eSight server with the latest service data as the active server.


  1. Log in to the eSight web page.

  2. Check whether key service information, such as NE configurations, performance data, and alarms, is normal.

  3. If the service data is lost, for example, the added device is lost, contact Huawei technical support.

Select a suitable rectification procedure based on the following scenarios:


  • No requirement on the service interruption time

    Log in to the active server using the management port or VNC as the root user.

    cd /opt/ommha/config

    sh config.sh

  • No service interruption

    Method 1: Log in to the active server and synchronize data.

    Log in to the active server using the management port or VNC as the root user.

    cd /opt/ommha/config

    sh config_online.sh

    Method 2: Log in to the standby server and synchronize data.
  1. Log in to the standby server using the management port or VNC as the ossuser user.

  2. Cancel the bash timeout setting.

    unset TMOUT

  3. Run the following command to query the OMMHA status and ensure that the local server is the standby server:

    sh /opt/ommha/ha/bin/status.sh

  4. Run the following command to stop OMMHA:

    sh /opt/ommha/ha/bin/stop.sh

  5. Run the following commands to rebuild the database:

    cd /opt/eSightZenith/

    sed -i "s#ENABLE_SYSDBA_LOGIN[[:space:]]=.*#ENABLE_SYSDBA_LOGIN = TRUE#g" data/cfg/zengine.ini

    python app/bin/zctl.py -t kill

    rm -f data/archive_log/arch*.arc

    python app/bin/zctl.py -t build

    If the following information is displayed, the database is successfully rebuilt:

    Successfully build database
  6. Start OMMHA.

    sh /opt/ommha/ha/bin/start.sh

  7. Log out of the system.


thanks for your sharing.
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.