Hello, everyone!
This post I want to share with you a case about VM migration fails because the generation rate of dirty pages in the VM memory is higher than the transmission bandwidth.
Problem Description
Engineers perform live migration on a VM, the VM is not successfully migrated for two hours, and the VM status is Migrating.
Problem Analysis
1. Run the following command to check the VM ID.
nova list | grep vm_name
2. Run the following command to check the VM request ID.
nova instance-action-list VMID
3. Run the following command to check the VM migrate status.
nova instance-action VMID VM request ID

4. Query the nova-compute logs of the source and migration host where the VM resides.
cd /var/log/fusionsphere/component/nova-compute/
nova-compute_error.log.100

5. Run the following command to check the VM instance ID.
nova show VMID | grep instance
6. Log in to the source host where the VM resides.
According to the command output, the dirty page data of the VM is constantly being migrated and cannot be migrated.

Root Cause
The generation rate of dirty pages in the VM memory is higher than the transmission bandwidth.
Solution Description
The following workarounds are provided:
Terminate live migration.
1. Log in to the controller node and run the following command to query the ID of the VM migration task whose status is running. VMID indicates the VM ID.
nova migration-list|grep VMID
2. Run the following command to manually stop the live migration. In the command, taskID is the migration task ID obtained in step 1.
nova live-migration-abort VMID taskID
3. Run the command queried in step 1 to check whether the live migration is stopped successfully.
Manually pausing dirty pages to complete live migration
1. Log in to the compute node where the VM resides and run the following command to query the instance ID of the VM to be migrated based on the instance name obtained in step 6.
virsh list

2. Run the following command to pause the generation of dirty pages for seconds to complete the live migration. In the command, instanceID is the ID corresponding to instance-name of the VM in step1 and the second is the number of seconds to pause the generation of dirty pages.
virsh suspend instanceID;sleep second;virsh resume instanceID

3. Run the following command to check whether the migration is complete:
nova migration-list|grep VMID
This is my solution, how about yours? Go ahead and share it with us!