1、Hat are the specifications of GPU instances?Which graphics cards are used?
GPU-accelerated ECSs include graphics-accelerated ECSs (G series) and computing-accelerated ECSs (P series).Graphics acceleration (G series) applies to graphics acceleration scenarios such as 3D animation rendering and CAD. Computing acceleration (P series) applies to machine learning, deep learning, and scientific computing.
Category | Instance | GPU graphics card | Application Scenario | Number of GPUs on the host | Remark |
Graphics-accelerated | G5 | NVIDIA V100(GPU virtualization) | Cloud desktop, image rendering, 3D visualization, and heavy-load graphics design | 2 cards | Mainstream delivery After G5 instances are provisioned, you need to purchase a vDWS license from NVIDIA to use GPUs. |
G3 | NVIDIA M60 (GPU passthrough) | 2 cards(4 x M60 cores) | M60 graphics cards are discontinued. Instances are gradually sold out and no capacity expansion is required. | ||
G1 | NVIDIA M60 (GPU virtualization) | 1 card(2 x M60 cores) | M60 graphics cards are discontinued. Instances are gradually sold out and no capacity expansion is required. | ||
Computing-accelerated | P2v | NVIDIA V100 NVLink (GPU passthrough) | Machine learning, deep learning, training reasoning, scientific computing, seismic analysis, computational finance, rendering, multimedia encoding and decoding | 8 cards | Mainstream delivery |
P1 | NVIDIA P100 (GPU passthrough) | 6 cards | P100 graphics cards are out of production. Instances are gradually sold out and no capacity expansion is required. | ||
Pi1 | NVIDIA P4 (GPU passthrough) | 6 cards | Mainstream delivery | ||
P2vs | NVIDIA V100 NVLink 32G (GPU passthrough) | 8 cards | |||
Pi2 | NVIDIA T4 (GPU passthrough) | 8 cards | |||
P2s | NVIDIA V100 32G (GPU passthrough) | 8 cards | |||
Ai-accelerated | Ai1 | Ascend 310 | Used for machine vision, speech recognition, and natural language processing to support scenarios such as smart retail, smart campus, robot cloud brain, and safe city. | 8 cards (32 chips) | |
Ais | Ascend 310 (to be added) | ||||
Kai1 | Ascend 310 + ARM (OBT) | 6 cards (24 chips) | |||
Kai1s | Ascend 310 + ARM (OBT) | 6 cards (24 chips) |
2、Which of the following parameters are used to analyze the performance of a graphics card?
Graphics card architecture: Kepler, Maxwell, Pascal, Volta, and Turing are evolutions of NVIDIA graphics cards. New-generation graphics cards are better than previous-generation graphics cards of the same level in terms of technology, architecture, and performance.
Number of CUDA cores for graphics cards: The number of CUDA cores determines the GPU parallel processing capability. In parallel computing services such as deep learning and machine learning, more CUDA cores mean better performance.
Video memory capacity: The video memory capacity determines the amount of data that can be loaded to the GPU.If the graphics memory can meet customer service requirements, increasing the graphics memory does not greatly improve service performance.In deep learning and machine learning training scenarios, the size of the display memory determines the amount of training data that can be loaded at a time. In large-scale training, the display memory is important.
Video memory bandwidth: The higher the video memory bandwidth, the better the processing performance of the graphics card.
Other metrics: In addition to common metrics of graphics cards, NVIDIA also has some metrics optimized for specific scenarios, such as TensoCore and RTCore capabilities.For example, Tensor Core is dedicated to accelerating tensor operations in deep learning training.
To evaluate the performance of a graphics card, you can evaluate the performance of a graphics card based on the performance of each indicator and customer service requirements.
3、How do I recommend GPU instances on the cloud based on the GPU host configuration used by the customer?
Selecting a scenario: When selecting a cloud instance type, determine the customer service scenario and software type, and then select the instance range based on the customer scenario, for example, G series or P series.
Selected graphics cards:
If the graphics card model can be provided on the cloud, directly recommend the graphics card instance that can be provided on the cloud.
If no graphics card is provided on the cloud, which of the following parameters should be considered when analyzing the performance of a graphics card based on the graphics card model used by the customer offline?What are the common offline consumer graphics cards? What are the mappings with the graphics cards provided on the cloud?and select the graphics card type and instance on the cloud.
Optional configuration:
After the graphics card model is selected, select the instance specifications based on the customer's requirements on the CPU, memory, hard disk, and number of graphics cards on a single VM.
4、NVIDIA Addresses
Tesla driver: https://www.nvidia.com/Download/index.aspx
Grid driver (the nvidia grid account is required): https://docs.nvidia.com/grid/index.html
CUDA tool package (select the version as required): https://developer.nvidia.com/cuda-toolkit-archive
GPU computing capability: https://developer.nvidia.com/cuda-gpus
GPUs supported by video codecs: https://developer.nvidia.com/video-encode-decode-gpu-support-matrix
NVIDIA grid account registration: https://enterpriseproductregistration.nvidia.com/?LicType=EVAL&ProductFamily=vGPU&ncid=undefined
To purchase a commercial license, contact https://www.ingrammicro.com.cn/Brand/Detail?BrandId=137.
5、What Are Tesla Drivers, vGPU/GRID Drivers, and Which Scenarios Are They Mainly Used?
Nvidia provides two drivers for the Tesla series GPUs:
Tesla driver: It is free of charge and is used for computing acceleration. It supports CUDA. Typical applications are AI applications such as deep learning.
GRID driver: A license needs to be purchased for graphics acceleration and supports standard graphics acceleration interfaces such as OpenGL, DirectX, and Vulcan. After the vDWS license is configured for instances, CUDA is also supported, which is generally used for graphics acceleration,
Typical scenarios include high-resolution desktops, 3D design, and games, or scenarios where software or applications require both graphics and computing acceleration.
6、What Are the Restrictions on Installing the GRID Driver?
G1 (except g1.2xlarge.8)/G5 (except g5.8xlarge.4) instances are deployed in GPU virtualization scenarios, that is, vGPU scenarios. In vGPU scenarios, the GRID driver of VMs must match the host GRID driver version.Currently, G1 supports GRID 4.1, and G5 supports GRID 7.1. You must select the correct driver when installing the driver. Otherwise, the graphics card cannot be identified or incompatible.
G3/P1/Pi1/P2/P2v is a GPU passthrough scenario. If you need to install the GRID driver in these instances to support graphics acceleration, you can select the GRID driver version to install it. The supported version list can be obtained from Nvidia's official website.
7、Why Cannot a GPU Instance Be Detected Using DirectX11?
Check the driver type of the instance. The Nvidia Tesla driver is a computing acceleration driver and does not support graphics acceleration (such as OpenGL, DirectX, and Vulcan).
To obtain graphics acceleration support, you must install the NVIDIA GRID driver.Install the GRID driver for the GPU instance by referring to the GRID driver installation guide, configure the GRID license, and perform the test again.
8、Why Is the GPU Usage Not Zero When No GPU Program Is Running on a GPU ECS Using nvidia-smi?
Run the nvidia-smi -pm 1 command to enable persistence mode. In this mode, the driver resides in the memory.
9、What are the VDI remote access software/protocols?
VNC: supports remote access to GPU instances, but the performance is poor. Free tight VNC is available.
Real VNC Server (for a fee), Real VNC Viewer
Tight VNC (free of charge)
hp-rgs charges, supports trial versions, has opaque watermarks on the screen, and does not support some high-performance features
TeamViewer
10、What are the common graphic and image pressure test tools used by the GPU?
Unigine_Heaven-4.0
redshift_v2.6.12
vraybench_1.0.8_win_x64-gui
3dmark-v2-6-6238
11、How Do I View the GPU Frequency of a GPU VM?
nvidia-smi dmon can be viewed.(mclk: video memory frequency; pclk: processor frequency)
12、View GPU devices in the VM?
Windows:
If the driver has been installed, run the cd C:\Program Files\NVIDIA Corporation\NVSMI and nvidia-smi commands in the cmd window.
If no driver is installed, run the wmic path win32_pnpentity where "deviceid like '%PCI\VEN_10DE%' "get name,deviceid command in the cmd window.
Linux:Run the lspci | grep -i nvidia or nvidia-smi command.
13、Why Cannot a VM Obtain a License After a License Is Configured?
Check whether the IP address and port number of the license server are correct.
Check whether the network and port between the VM and the license server are normal.
Check whether the device driver is faulty.
If the M60 server is used, check whether the license contains GRID-Virtual-WS 2.0.
14、Why Does the VNC Tool Fail to Be Connected After the Local MSTSC Is Disconnected from a VM?
Check whether the VNC server service is running.This problem occurs when the VNCServer is not started automatically.