{"id":65,"date":"2020-12-30T16:14:47","date_gmt":"2020-12-30T15:14:47","guid":{"rendered":"https:\/\/threedots.ovh\/blog\/?p=65"},"modified":"2020-12-30T16:31:59","modified_gmt":"2020-12-30T15:31:59","slug":"state-of-the-gpu-compute-apis-today","status":"publish","type":"post","link":"https:\/\/threedots.ovh\/blog\/2020\/12\/state-of-the-gpu-compute-apis-today\/","title":{"rendered":"State of the GPU compute APIs today"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>NVIDIA:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pioneer of the field, mature toolkits. Still evolving quickly, especially for higher-level APIs. Every GPU that NVIDIA sells supports CUDA.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The HPC SDK, formerly known as PGI, which is Linux only today, adds support for OpenACC, C++ standard parallelism (stdpar) and OpenMP (support currently in beta).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One of the downsides on NVIDIA&#8217;s HPC SDK licensing (inherited from the PGI licensing agreement) is this clause:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>You shall strictly prohibit the further distribution of the Run-Time Files by users of an End-User Application<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Which can prevent applications from being distributed at all in some cases, as a user cannot redistribute the whole app bundled with its required runtime files. This issue doesn&#8217;t apply to the CUDA SDK which almost everyone uses.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>AMD:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The current effort for GPGPU programming on AMD hardware is ROCm. The officially supported APIs in addition to AMD&#8217;s own HIP are OpenMP and OpenACC.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It has some quite visible downsides:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Linux-only, that alone removes it from consideration for quite a big part of the market.<\/li><li>Binaries generated by the ROCm toolchain aren&#8217;t targeting an IR, but directly the underlying hardware. For new generations, binaries have to be recompiled by the software provider.<\/li><li>Spotty to non-existent support for new hardware for quite a long time after release.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Those downsides decrease its utility on desktop to effectively nil, where OpenCL remains the vendor-supported API for AMD GPU hardware.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Intel:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">oneAPI is supported on all recent Intel GPUs, but they do not have released hardware with a high performance level yet. The officially supported APIs apart from Intel&#8217;s own Level Zero are OpenMP and SYCL.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">oneAPI&#8217;s Level Zero uses SPIR-V as an IR, allowing seamless support by existing applications for future hardware. Windows is supported too.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Khronos:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Provides industry standards usable by multiple vendors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The reset, known as OpenCL 3.0, doesn&#8217;t have visible impacts yet. Vulkan compute combined with SYCL could be a more viable path forward to have single binaries usable across multiple vendors combined with a good developer experience.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>OpenCL support in practice (section added after publication):<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As of today, NVIDIA provides an OpenCL 1.2 implementation with extensions. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">AMD provides a passable OpenCL 1.2 implementation and quite buggy OpenCL 2.x support (notably doesn&#8217;t support debugging properly). <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Intel provides an OpenCL 3.0 implementation for their GPUs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">OpenCL 1.2 also works on macOS, including on Apple Silicon Macs, but is documented as deprecated.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Microsoft:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">C++ AMP looks like it&#8217;s dead. Vendor-independent, supported by Visual C++ but was never updated past D3D11. Was supported by old ROCm versions too.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Apple:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Metal compute is macOS\/iOS\/&#8230; only which reduces its appeal in the GPGPU field quite a lot, especially when GPU compute performance is involved. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>NVIDIA: Pioneer of the field, mature toolkits. Still evolving quickly, especially for higher-level APIs. Every GPU that NVIDIA sells supports CUDA. The HPC SDK, formerly known as PGI, which is Linux only today, adds support for OpenACC, C++ standard parallelism (stdpar) and OpenMP (support currently in beta). One of the downsides on NVIDIA&#8217;s HPC SDK&hellip;&nbsp;<a href=\"https:\/\/threedots.ovh\/blog\/2020\/12\/state-of-the-gpu-compute-apis-today\/\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">State of the GPU compute APIs today<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[1],"tags":[5],"class_list":["post-65","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-gpus"],"_links":{"self":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/65","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/comments?post=65"}],"version-history":[{"count":7,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/65\/revisions"}],"predecessor-version":[{"id":73,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/65\/revisions\/73"}],"wp:attachment":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/media?parent=65"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/categories?post=65"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/tags?post=65"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}