HMM in CUDA on the NVIDIA r530 drivers

Heterogenous memory management on GPUs allows to use the full host process’s address space within GPU device code, without requiring pinning or the allocation to even be done within CUDA.

Pages are dynamically migrated from the host to the device depending on use, without any programmer action being required.

A warning to keep in mind

// TODO: Bug 1750144: Enable HMM by default once the feature is formally
// enabled for production use.

// Support for HMM ( https://docs.kernel.org/mm/hmm.html ). HMM features are
// disabled in this driver because they are still in development. In order to
// experiment with the early, "alpha" version of HMM features in this driver,
// set "uvm_disable_hmm=0" on the kernel or module loading command line.
// However, again:
//
//    HMM FEATURES IN THIS DRIVER ARE UNSUPPORTED BECAUSE THEY ARE NOT READY.
//
//    ENABLING HMM IN THIS DRIVER MAY CAUSE YOUR KERNEL TO CRASH, CORRUPT DATA,
//    OR OTHERWISE MISBEHAVE. DO NOT ENABLE HMM HERE UNLESS YOU ARE FULLY
//    PREPARED TO DEAL WITH THE INEVITABLE BUGS AND LIMITATIONS IN THIS ALPHA
//    FEATURE.
//

How to test it?

Your OS should have Linux 6.1 or later with CONFIG_HMM_MIRROR and CONFIG_DEVICE_PRIVATE enabled. Don’t worry, those configuration flags are almost surely are already defined in the Linux kernel you’re using.

Then create a file, /etc/modprobe.d/nvidia-uvm.conf, and put this inside:

options nvidia-uvm uvm_disable_hmm=0

Then you should install the open-source kernel modules. HMM is not available on the proprietary NVIDIA kernel modules.

A small test program

#include <stdio.h>

__global__ void compute_this(int *pDataFromCpu)
{
    atomicAdd(pDataFromCpu, 1);
}

int main(void)
{
    int data = 1;
    compute_this<<<512,1000>>>(&data);
    cudaDeviceSynchronize();
    printf ("Results: %d\n", data);
    free(data);
    return 0;
}

If HMM is disabled, you’ll see a Xid error in dmesg and Results: 1. If HMM is enabled, you will see Results: 512001.

Can HMM do more?

Yes, HMM allows you for example to transparently use mmap‘d files from the GPU. Or do syscalls through io_uring, or…

The new possibilities make writing some kinds of code substantially easier. This includes fun new programming models such as Accelerating standard C++ on GPUs with stdpar.

HMM allows to share the same programming model of the Grace Hopper Superchip on a commodity platform.

Does this need an x86 machine?

No, this works on arm64 systems just fine too. Note however that this is not applicable to Tegra iGPUs, which currently use a totally separate RM (nvgpu). For a dGPU attached to a Tegra system, you can use the desktop RM, which does support HMM.

Which GPUs are required for this?

Turing (2018) or later on this driver release.