GPU virtualisation is becoming more and more essential with time, on both server and desktop scenarios. And automotive too, but that’s a write-up for another day.
There are different approaches to reach this end goal, with different tradeoffs. Some of the approaches aren’t applicable to all platforms. This blog post isn’t intended to be an exhaustive overview.
Exposing a device more or less directly
The choices in this category mostly but not necessarily rely on the guest VM to be on the same machine as the physical GPU. For example, PCIe encapsulation over Ethernet, as done on Fungible DPUs, can be used to have the GPU and the VM using it on separate physical machines.
Passthrough w/ IOMMU isolation
When taking this route, a GPU can be used only by a single (host or virtual) machine. The regular driver set is retained. Care is necessary to prevent the virtual machine of being able to reflash GPU hardware or tamper the GPU hardware in other ways.
Example cloud instances: the average GPU instance at your public cloud provider.
Hardware-assisted vGPU solutions can allow to expose multiple vGPU devices out of a single physical GPU, notably but not necessarily using SR-IOV. This involves trusting the on-device firmware to properly isolate between contexts. The host driver has to be trusted too.
Such a solution can but doesn’t necessarily involve static VRAM partitioning. When using NVIDIA’s vGPU solution, live migration of vGPU devices is supported.
On AMD GPUs, this is supported via the
gim kernel driver. On NVIDIA GPUs, this is supported via the GRID driver set.
Example cloud instances: NVv4 instances on Microsoft Azure, AWS WorkSpaces Radeon GPU bundles.
Another class of virtual GPU solutions rely on synthetic GPU devices. These can be separated into two categories, serialising the command stream or exposing a GPU model-specific API boundary.
Serialising the command stream
Serialising the command stream decouples the vGPU from the underlying hardware. This means that the underlying hardware can potentially change without needing to apply changes within the guest operating system.
They can potentially be implemented with low to no involvement from the GPU vendor, but does involve different security trade-offs.
RemoteFX vGPU was a vGPU infrastructure included in Windows. It has been removed in a security update because of architectural safety problems in the RPC layer.
virgl as included as part on mesa runs on the
virtio-gpu transport. Vulkan API support on top of the
virtio-gpu transport as part of Mesa is also being developed under the
venus name, and can be enabled manually.
venus are supported as part of chromeOS, notably for Crostini.
ParavirtualizedGraphics on macOS does rely on command stream serialisation.
These are quite underinvested in today. Those solutions allow for the GPU and the VM using it to be on different physical machines.
A canonical example is indirect GLX, which is not to be used in modern software. Another example is AWS’s Elastic GPUs product, which just isn’t that great API support-wise.
The Graphics Streaming Kit from Google supports both using the
virtio-gpu transport or a network transport. And it supports Vulkan too! However, GPGPU doesn’t seem to be a focus. This solution is used as part of the Android Emulator (for AVDs).
Exposing the KM -> UM interface to a VM
This set of solutions involves exposing the existing kernel mode to user-mode GPU driver API boundary to virtual machines.
It is the solution taken for GPU-P, the GPU paravirtualisation infrastructure included as part of the Hyper-V portfolio today. Unlike RemoteFX vGPU, this solution involves having the GPU-specific user-mode driver run in the guest context.
GPU-P is notably used for Windows Sandbox, Windows Defender Application Guard and WSL2 GPU acceleration support. It can also be enabled manually for regular virtual machines. Note that GPU-P explicitly does not support mismatched host and guest GPU driver versions today.