GPU VMs fail to start Highlighted

170 0 1 0

GPU VMs fail to start and a message indicating that the GPU device fails to create is displayed.


[Keywords]


GPU


[Applicable version]


FusionCompute 6.3 and 6.5


[Confidentiality]


Service channel-level


[Description]


The GPU is configured.


[Alarm information]


The vGPU device is abnormal.

 

[Analysis]


1. Check the vna-worker.log log of the CNA node. The result shows that the vGPU device path fails to be queried and the message 'No such file or directory' is displayed.

 

143208i2y5zbvvydv2wy55.jpg?23.jpg


2. Check whether the file exists in the corresponding directory of the host. The result shows that the file does not exist.

 

143216c1gqyouqqfnoni2q.jpg?24.jpg


[Cause]


The NVIDIA GPU driver is incorrectly installed.


Three types of NVIDIA drivers are provided for Windows, Linux, and Linux with KVM. The driver to be installed on the CNA node is that for Linux with KVM. The name of the driver is NVIDIA-Linux-x86_64-xxx.xx-vgpu-kvm.run, and the driver version must be 6.2. If other drivers are installed, you may not be able to use them.


[Procedure]


1. Uninstall the GPU driver: Migrate all VMs on the CNA node and run the nvidia-uninstall command.


2. Install the correct GPU driver.


[Conclusion]


FusionCompute supports only the GPU driver of 6.2.


To determine that the correct GPU driver is installed on the CNA node, perform the following steps:


1. Run the nvidia-smi command on the host to query information about the driver.

 

143236hq6mc7chd41fc68b.jpg?25.jpg



2. Log in to the official NVIDIA website to check the mapping between the driver version and GPU version.


https://docs.nvidia.com/grid/index.html


143241rvc5rcs1prf7xvxp.jpg?26.jpg


  • x
  • convention:

Login and enjoy all the member benefits

Login and enjoy all the member benefits

Login