Hello, everyone!
This post will share with you VM HA and how to solve the VM HA failed issue.
HA rebuilding mechanism at the IaaS layer:
- The system detects that the host status is abnormal and initiates a VM rebuilding request.
- Select a proper target host.
- On the target host, detect the source VM using the detection VLAN on the physical network plane bound to the source VM.
- If the detection result is normal, the source VM is still alive. In this case, the VM does not need to be rebuilt, and the process ends.
- If the detection fails, the source VM does not exist. In this case, rebuild the VM.
- Rebuild a new VM on the target host, create a new VM based on the flavor and network plane of the source VM, and create a new system disk. After the VM is created, attach the data disk of the source VM to the target VM.
- After the source host is recovered, destroy the source VM. The process ends.
The task corresponding to HA is called rescheduler.
Obtain the request ID of the rescheduler task.
1. Log in to a controller node and import environment variables.
2. Run the following command to check whether the rescheduler task is generated.
nova instance-action-list VM_UUID
If no reschedule task exists, HA is not triggered. Continue to analyze.
If reschedule is displayed, HA is triggered. Record the request ID.
Search for the selected target host.
This is because VM HA involves two hosts, the source host that starts running and the destination host that will run.
Method 1: View the nova-scheduler log to check whether the destination host is selected.
Run the following command to query the source host:
zgrep req-id /var/log/fusionsphere/component/nova-scheduler/* grep ignore
Run the following command to search for the target host. Generally, the first host before weight is the selected host.
zgrep req-id /var/log/fusionsphere/component/nova-scheduler/* | grep WeighedHost
If no weight is selected for the host, check the reason why each host is filtered out in the scheduler. Generally, the reason is that the resources of other hosts are insufficient. For example, the hard board is insufficient and CPU core binding is involved.
After the host is found, obtain the IP address of the host based on the host ID. In the KVM scenario, run the cps host-list|grep Host name command to query the IP address of the host and log in to the host.
If FusionCompute is connected, run the cps template-instance-list --service nova fc-nova-compute00X command to locate the host where the active component is located, and log in to the host (for details, see KVM).
Method 2: Check whether the target host is selected in action-list.
nova action-list vm_id req-id
control01:/home/fsp # nova instance-action 30deb2ba-8d26-4900-a665-e3eea155ce5c
req-383a21aa-0930-438f-bdec-02b829882a29
req-383a21aa-0930-438f-bdec-02b829882a29
+---------------+----------------------------------------------------+
+---------------+----------------------------------------------------+
| Property | Value |
| Property | Value |
+---------------+----------------------------------------------------+
+---------------+----------------------------------------------------+
| action | create |
| action | create |
| events | [{u'event': u'compute__do_build_and_run_instance', | //create a vm on selected host
| events | [{u'event': u'compute__do_build_and_run_instance', | //Create a VM on the selected host.
| | u'finish_time': u'2020-03-25T00:55:41.773988', |
| | u'finish_time': u' 2020-03-25T00:55:41.773988', |
| | u'result': u'Success@control3', |
| | u'result': u'Success@control3', |
| | u'start_time': u'2020-03-25T00:49:57.367914', |
| | u'start_time': u' 2020-03-25T00:49:57.367914', |
| | u'traceback': None}, |
| | u'traceback': None}, |
| | {u'event': u'select_destinations', |
| | {u'event': u'select_destinations', |
| | u'finish_time': u'2020-03-25T00:49:57.121248', |
| | u'finish_time': u'2020-03-25T00:49:57.121248', |
| | u'result': u'Success@control1', | //
| | u'result': u'Success@control1', | //Select the host phase, indicating that the nova-scheduler on control1 is used.
| | u'start_time': u'2020-03-25T00:49:56.434078', |
| | u'start_time': u'2020-03-25T00:49:56.434078', |
| | u'traceback': None}] |
| | u'traceback': None}] |
| instance_uuid | 30deb2ba-8d26-4900-a665-e3eea155ce5c |
| instance_uuid | 30deb2ba-8d26-4900-a665-e3eea155ce5c |
| message | - |
| message | - |
| project_id | 312d453b12b54faa8cde67aaf5d99e76 |
| project_id | 312d453b12b54faa8cde67aaf5d99e76 |
| request_id | req-383a21aa-0930-438f-bdec-02b829882a29 |
| request_id | req-383a21aa-0930-438f-bdec-02b829882a29 |
| start_time | 2020-03-25T00:49:52.081854 |
| start_time | 2020-03-25T00:49:52.081854 |
| updated_at | 2020-03-25T00:55:41.773988 |
| updated_at | 2020-03-25T00:55:41.773988 |
| user_id | cc9b0ab0d1e042e5beba8de97f682aa4 |
| user_id | cc9b0ab0d1e042e5beba8de97f682aa4 |
+---------------+----------------------------------------------------+
+---------------+----------------------------------------------------+
Log in to the host, view the compute log, and find the error information.
KVM host:
zgrep req-id /var/log/fusionsphere/component/nova-compute/*
FusionCompute host:
zgrep req-id /var/log/fusionsphere/component/fc-nova-compute00X/*
This is my solution, how about yours? Go ahead and share it with us!