Pioneer of the field, mature toolkits. Still evolving quickly, especially for higher-level APIs. Every GPU that NVIDIA sells supports CUDA.
The HPC SDK, formerly known as PGI, which is Linux only today, adds support for OpenACC, C++ standard parallelism (stdpar) and OpenMP (support currently in beta).
One of the downsides on NVIDIA’s HPC SDK licensing (inherited from the PGI licensing agreement) is this clause:
You shall strictly prohibit the further distribution of the Run-Time Files by users of an End-User Application
Which can prevent applications from being distributed at all in some cases, as a user cannot redistribute the whole app bundled with its required runtime files. This issue doesn’t apply to the CUDA SDK which almost everyone uses.
The current effort for GPGPU programming on AMD hardware is ROCm. The officially supported APIs in addition to AMD’s own HIP are OpenMP and OpenACC.
It has some quite visible downsides:
- Linux-only, that alone removes it from consideration for quite a big part of the market.
- Binaries generated by the ROCm toolchain aren’t targeting an IR, but directly the underlying hardware. For new generations, binaries have to be recompiled by the software provider.
- Spotty to non-existent support for new hardware for quite a long time after release.
Those downsides decrease its utility on desktop to effectively nil, where OpenCL remains the vendor-supported API for AMD GPU hardware.
oneAPI is supported on all recent Intel GPUs, but they do not have released hardware with a high performance level yet. The officially supported APIs apart from Intel’s own Level Zero are OpenMP and SYCL.
oneAPI’s Level Zero uses SPIR-V as an IR, allowing seamless support by existing applications for future hardware. Windows is supported too.
Provides industry standards usable by multiple vendors.
The reset, known as OpenCL 3.0, doesn’t have visible impacts yet. Vulkan compute combined with SYCL could be a more viable path forward to have single binaries usable across multiple vendors combined with a good developer experience.
OpenCL support in practice (section added after publication):
As of today, NVIDIA provides an OpenCL 1.2 implementation with extensions.
AMD provides a passable OpenCL 1.2 implementation and quite buggy OpenCL 2.x support (notably doesn’t support debugging properly).
Intel provides an OpenCL 3.0 implementation for their GPUs.
OpenCL 1.2 also works on macOS, including on Apple Silicon Macs, but is documented as deprecated.
C++ AMP looks like it’s dead. Vendor-independent, supported by Visual C++ but was never updated past D3D11. Was supported by old ROCm versions too.
Metal compute is macOS/iOS/… only which reduces its appeal in the GPGPU field quite a lot, especially when GPU compute performance is involved.