AMD ROCm is a GPGPU compute solution exposing two APIs: OpenCL and HIP. OpenCL and its upsides and downsides are more well-known, so I’ll focus on what HIP is in this blog post.
What is HIP?
HIP is a wholesale clone of the CUDA APIs, including the driver, runtime and libraries’ APIs. That’s not a bad thing, it acknowledges what the industry standard is, making portability easier.
Where does it differ?
Some major differences exist between the two:
- A choice was taken to change the prefix of the functions using the cuda prefix to the hip prefix. Some #define blocks used in CUDA were also changed to enums. This breaks straightforward source-level compatibility.
- There’s no intermediate ISA other than LLVM IR (targeted at a specific GPU model in ROCm). In CUDA, the app developer can ship a PTX code slice that will run on later hardware without recompilation required. Inline assembler directives also offer forward-compatibility with newer hardware unlike on AMD GPUs, as they expose the raw ISA there.
There are more differences, those two are the architectural ones that’ll affect you the most during application porting. The latter means that you won’t have translation tools provided by AMD if you don’t use CUDA C++ but another programming model instead. You’ll also be affected if you use inline PTX sections, which will have to be ported.
AOMP is an AMD implementation of the OpenMP API that runs on top of the HIP API. If you are using OpenMP on GPUs, that option is recommended. If you use Fortran, you’re mostly on your own for now.
What tool is being used to port a CUDA C++ application to ROCm?
One of them, hipify-clang is written using the Clang API. It relies on the CUDA SDK being installed, parsing the full file and outputting an equivalent using the ROCm equivalents after running transformation matchers via a compiler pass.
The other is hipify-perl. It doesn’t require on the CUDA SDK to be installed and just runs transformation matchers sed-style.
Those tools generate a file that can be compiled using hipcc, which is the nvcc equivalent in ROCm HIP. You might have some compatibility issues that require more porting.
Why isn’t ROCm used more today then?
- Operating system support: ROCm is available on Linux only. HIP on Windows doesn’t have a public SDK. A beta driver became public very recently, which is used by Blender 3.0’s HIP backend in Cycles X.
- Hardware support: This is a big one. The current ROCm release only officially supports Vega 10 (Radeon Instinct MI25, which will go out of support in ROCm 5.0, approximately 4 years after release), Vega 20 (Radeon VII, Instinct MI50) and MI100 GPUs. Support for the Navi (RDNA/RDNA2) family of GPUs is still in a not production ready state.
- Driver quality: Frequent regressions significantly shaped the opinion of users on ROCm. Not supporting newer hardware officially is a significant factor of that too.
The combination of hardware and software where ROCm is supported results in a meagre installed base, with a low number of hobbyists who have their hand on a compatible configuration. Driver quality also had an impact on the amount of people willing to test.
I tried to load TensorFlow/other app or library on my AMD GPU with ROCm installed on Linux and it didn’t work, why?
On a generation that is enabled (as in, present in the code base) but not supported by AMD, like every single APU, you might have to recompile the applications that you’re using. This is required to add a binary slice corresponding to the GPU architecture of those parts.
This means that you’ll have to know how to build from source (and have the CPU resources necessary), and have the knowledge to fix breakage. This situation manifests with a binary not found for all GPUs error on the system.
In that situation, you might have to recompile ROCm as a whole, and your application/libraries too.
I just want some hardware to test, where can I rent some?
Disclaimer: I worked at AWS on a GPU-related team in the past, my opinion below might be a bit biased.
Especially in this current GPU market, if you aren’t ready to buy a GPU, the best choice around to test ROCm seems to be AWS’s G4ad instances. These come with Navi12 GPUs, which are RDNA (1st gen) parts that include HBM2 memory.
While those are not supported officially by ROCm yet, they are enabled, making them one of the possible options to start application porting. However, if you are using the upper layers/bundled libraries like rocBLAS though, you might still reach significant issues at this point in time.