Got it

VM HA failed

Latest reply: Dec 12, 2021 12:25:24 1039 3 2 0 0

Hello, everyone!

This post will share with you VM HA and how to solve the VM HA failed issue.

HA rebuilding mechanism at the IaaS layer:

- The system detects that the host status is abnormal and initiates a VM rebuilding request.

- Select a proper target host.

- On the target host, detect the source VM using the detection VLAN on the physical network plane bound to the source VM.

- If the detection result is normal, the source VM is still alive. In this case, the VM does not need to be rebuilt, and the process ends.

- If the detection fails, the source VM does not exist. In this case, rebuild the VM.

- Rebuild a new VM on the target host, create a new VM based on the flavor and network plane of the source VM, and create a new system disk. After the VM is created, attach the data disk of the source VM to the target VM.

- After the source host is recovered, destroy the source VM. The process ends.

 

The task corresponding to HA is called rescheduler.

VM HA

Obtain the request ID of the rescheduler task.

1. Log in to a controller node and import environment variables.

2. Run the following command to check whether the rescheduler task is generated.

nova instance-action-list  VM_UUID

request id

If no reschedule task exists, HA is not triggered. Continue to analyze.

If reschedule is displayed, HA is triggered. Record the request ID.

Search for the selected target host.

This is because VM HA involves two hosts, the source host that starts running and the destination host that will run.

Method 1: View the nova-scheduler log to check whether the destination host is selected.

Run the following command to query the source host:

zgrep req-id  /var/log/fusionsphere/component/nova-scheduler/* grep ignore                                                                                                                        

Run the following command to search for the target host. Generally, the first host before weight is the selected host.

zgrep req-id  /var/log/fusionsphere/component/nova-scheduler/* | grep WeighedHost

If no weight is selected for the host, check the reason why each host is filtered out in the scheduler. Generally, the reason is that the resources of other hosts are insufficient. For example, the hard board is insufficient and CPU core binding is involved.

After the host is found, obtain the IP address of the host based on the host ID. In the KVM scenario, run the cps host-list|grep Host name command to query the IP address of the host and log in to the host.

If FusionCompute is connected, run the cps template-instance-list --service nova fc-nova-compute00X command to locate the host where the active component is located, and log in to the host (for details, see KVM).

Method 2: Check whether the target host is selected in action-list.

nova action-list vm_id req-id

control01:/home/fsp # nova instance-action 30deb2ba-8d26-4900-a665-e3eea155ce5c

req-383a21aa-0930-438f-bdec-02b829882a29

req-383a21aa-0930-438f-bdec-02b829882a29

+---------------+----------------------------------------------------+

+---------------+----------------------------------------------------+

| Property      | Value                                              |

| Property | Value |

+---------------+----------------------------------------------------+

+---------------+----------------------------------------------------+

| action        | create                                             |

| action | create |

| events        | [{u'event': u'compute__do_build_and_run_instance', | //create a vm on selected host

| events | [{u'event': u'compute__do_build_and_run_instance', | //Create a VM on the selected host.

|               |   u'finish_time': u'2020-03-25T00:55:41.773988',   |

| | u'finish_time': u' 2020-03-25T00:55:41.773988', |

|               |   u'result': u'Success@control3',                  |

| | u'result': u'Success@control3', |

|               |   u'start_time': u'2020-03-25T00:49:57.367914',    |

| | u'start_time': u' 2020-03-25T00:49:57.367914', |

|               |   u'traceback': None},                             |

| | u'traceback': None}, |

|               |  {u'event': u'select_destinations',                |

| | {u'event': u'select_destinations', |

|               |   u'finish_time': u'2020-03-25T00:49:57.121248',   |

| | u'finish_time': u'2020-03-25T00:49:57.121248', |

|               |   u'result': u'Success@control1', |  //

| | u'result': u'Success@control1', | //Select the host phase, indicating that the nova-scheduler on control1 is used.

|               |   u'start_time': u'2020-03-25T00:49:56.434078',    |

| | u'start_time': u'2020-03-25T00:49:56.434078', |

|               |   u'traceback': None}]                             |

| | u'traceback': None}] |

| instance_uuid | 30deb2ba-8d26-4900-a665-e3eea155ce5c               |

| instance_uuid | 30deb2ba-8d26-4900-a665-e3eea155ce5c |

| message       | -                                                  |

| message | - |

| project_id    | 312d453b12b54faa8cde67aaf5d99e76                   |

| project_id | 312d453b12b54faa8cde67aaf5d99e76 |

| request_id    | req-383a21aa-0930-438f-bdec-02b829882a29           |

| request_id | req-383a21aa-0930-438f-bdec-02b829882a29 |

| start_time    | 2020-03-25T00:49:52.081854                         |

| start_time | 2020-03-25T00:49:52.081854 |

| updated_at    | 2020-03-25T00:55:41.773988                         |

| updated_at | 2020-03-25T00:55:41.773988 |

| user_id       | cc9b0ab0d1e042e5beba8de97f682aa4                   |

| user_id | cc9b0ab0d1e042e5beba8de97f682aa4 |

+---------------+----------------------------------------------------+

+---------------+----------------------------------------------------+

 

Log in to the host, view the compute log, and find the error information.

KVM host:

zgrep req-id /var/log/fusionsphere/component/nova-compute/*

FusionCompute host:

zgrep req-id /var/log/fusionsphere/component/fc-nova-compute00X/*

This is my solution, how about yours? Go ahead and share it with us!


  • x
  • convention:

S_Noch
Created May 18, 2021 03:51:12

Good post
View more
  • x
  • convention:

olive.zhao
olive.zhao Created May 18, 2021 03:54:48 (0) (0)
Thanks!  
Unicef
MVE Created Dec 12, 2021 12:25:24

GOOD SHARE
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.