ROCm HIP on Windows?

Microsoft’s Antares code generator supports generating code for ROCm HIP on Windows among other targets.

How does it compile and execute that code when the ROCm HIP SDK on Windows isn’t public at all, in a way accessible to everyone?

The runtime side

https://github.com/microsoft/antares/blob/v0.3.x/backends/c-rocm_win64/include/backend.hpp – MIT licensed

Loading the runtime

    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

The ROCm DLL is loaded via LoadLibrary, instead of directly linking to the ROCm runtime.

Function pointers to AMD’s ROCm HIP driver

We don’t have headers, so functions are prototyped specifically for this.

The currently used list by Antares:

    LOAD_ONCE(hipSetDevice, int (*)(int));
    LOAD_ONCE(hipMalloc, int (*)(void*, size_t));
    LOAD_ONCE(hipModuleLoadData, int (*)(void*, const char*));
    LOAD_ONCE(hipModuleGetFunction, int (*)(void*, const void*, const char*));
    LOAD_ONCE(hipModuleLaunchKernel, int (*)(...));
    LOAD_ONCE(hipMemcpyHtoDAsync, int (*)(...));
    LOAD_ONCE(hipMemcpyDtoHAsync, int (*)(...));
    LOAD_ONCE(hipStreamSynchronize, int (*)(void*));
    LOAD_ONCE(hipEventCreate, int (*)(void*, int));
    LOAD_ONCE(hipEventRecord, int (*)(void*, void*));
    LOAD_ONCE(hipDeviceSynchronize, int (*)());
    LOAD_ONCE(hipEventElapsedTime, int (*)(float*, void*, void*));
    LOAD_ONCE(hipEventDestroy, int (*)(void*));

Compiling code targeting ROCm HIP

What does Antares does here given that the ROCm HIP SDK on Windows is not public? The answer: it uses the Linux version running under WSL.

  std::string moduleCompile(const std::string &source) {
    std::string path = "/tmp/.antares-module-tempfile.cu";
    FILE *fp = fopen(path.c_str(), "w");
    fwrite(source.data(), source.size(), 1, fp);
    fclose(fp);

    char amdgfx[] = "__AMDGFX__ gfx";
    const char *spec = strstr(source.data(), amdgfx);
    CHECK(spec != nullptr, "__AMDGFX__ is not found in Antares code for Windows ROCm.");
    std::string arch = "gfx" + std::to_string(std::atoi(spec + sizeof(amdgfx) - 1));

    ab_utils::Process({"wsl.exe", "sh", "-cx", "\"/opt/rocm/bin/hipcc " + path + " --amdgpu-target=" + arch + " --genco -Wno-ignored-attributes -O2 -o " + path + ".out 1>&2\""}, 10);
    return file_read((path + ".out").c_str());
  }

This means that the single-source programming model side isn’t accessible, giving instead a more restricted device code-only model reminiscent of hipRTC (aka AMD’s NVRTC clone, for runtime C++ device code compilation).

This is it? Where is the ROCm HIP on Windows SDK?

There’s also hiprtBuildTraceProgram and hiprtBuildTraceGetBinary as part of HIP-RT.

You might also try to use hipRTC’s runtime compilation features, an example of doing so is present for hashcat’s HIP backend. Or AMD’s Orochi. Those options don’t allow using the single-source nature expected of modern GPGPU APIs, requiring device code to be in separate files.

This doesn’t give good options to users of AMD GPUs overall on the world’s most used desktop operating system for GPGPU use.

The ROCm HIP limitations otherwise present apply, including the fact that binaries are hardware-specific instead of having an IR option compatible between GPU generations.

(edit: added mentions of ROCm HIP support in hashcat and Orochi)

The runtime side

Loading the runtime

Function pointers to AMD’s ROCm HIP driver

Compiling code targeting ROCm HIP

This is it? Where is the ROCm HIP on Windows SDK?

Leave a Reply Cancel reply