{"id":403,"date":"2022-05-07T01:37:31","date_gmt":"2022-05-06T23:37:31","guid":{"rendered":"https:\/\/threedots.ovh\/blog\/?p=403"},"modified":"2022-05-07T02:09:05","modified_gmt":"2022-05-07T00:09:05","slug":"rocm-hip-on-windows","status":"publish","type":"post","link":"https:\/\/threedots.ovh\/blog\/2022\/05\/rocm-hip-on-windows\/","title":{"rendered":"ROCm HIP on Windows?"},"content":{"rendered":"\n<p>Microsoft&#8217;s <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/microsoft\/antares\" target=\"_blank\">Antares<\/a> code generator supports generating code for ROCm HIP on Windows among other targets.<\/p>\n\n\n\n<p>How does it compile and execute that code when the ROCm HIP SDK on Windows isn&#8217;t public at all, in a way accessible to everyone?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The runtime side<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/microsoft\/antares\/blob\/v0.3.x\/backends\/c-rocm_win64\/include\/backend.hpp  \">https:\/\/github.com\/microsoft\/antares\/blob\/v0.3.x\/backends\/c-rocm_win64\/include\/backend.hpp<\/a> &#8211; MIT licensed<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Loading the runtime<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);\n    CHECK(hLibDll, \"Cannot find `\" AMDHIP64_LIBRARY_PATH \"` !\\n\");<\/code><\/pre>\n\n\n\n<p>The ROCm DLL is loaded via LoadLibrary, instead of directly linking to the ROCm runtime.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Function pointers to AMD&#8217;s ROCm HIP driver<\/h3>\n\n\n\n<p>We don&#8217;t have headers, so functions are prototyped specifically for this.<\/p>\n\n\n\n<p>The currently used list by Antares:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    LOAD_ONCE(hipSetDevice, int (*)(int));\n    LOAD_ONCE(hipMalloc, int (*)(void*, size_t));\n    LOAD_ONCE(hipModuleLoadData, int (*)(void*, const char*));\n    LOAD_ONCE(hipModuleGetFunction, int (*)(void*, const void*, const char*));\n    LOAD_ONCE(hipModuleLaunchKernel, int (*)(...));\n    LOAD_ONCE(hipMemcpyHtoDAsync, int (*)(...));\n    LOAD_ONCE(hipMemcpyDtoHAsync, int (*)(...));\n    LOAD_ONCE(hipStreamSynchronize, int (*)(void*));\n    LOAD_ONCE(hipEventCreate, int (*)(void*, int));\n    LOAD_ONCE(hipEventRecord, int (*)(void*, void*));\n    LOAD_ONCE(hipDeviceSynchronize, int (*)());\n    LOAD_ONCE(hipEventElapsedTime, int (*)(float*, void*, void*));\n    LOAD_ONCE(hipEventDestroy, int (*)(void*));<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Compiling code targeting ROCm HIP<\/h3>\n\n\n\n<p>What does Antares does here given that the ROCm HIP SDK on Windows is not public? The answer: it uses the Linux version running under WSL.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  std::string moduleCompile(const std::string &amp;source) {\n    std::string path = \"\/tmp\/.antares-module-tempfile.cu\";\n    FILE *fp = fopen(path.c_str(), \"w\");\n    fwrite(source.data(), source.size(), 1, fp);\n    fclose(fp);\n\n    char amdgfx&#91;] = \"__AMDGFX__ gfx\";\n    const char *spec = strstr(source.data(), amdgfx);\n    CHECK(spec != nullptr, \"__AMDGFX__ is not found in Antares code for Windows ROCm.\");\n    std::string arch = \"gfx\" + std::to_string(std::atoi(spec + sizeof(amdgfx) - 1));\n\n    ab_utils::Process({\"wsl.exe\", \"sh\", \"-cx\", \"\\\"\/opt\/rocm\/bin\/hipcc \" + path + \" --amdgpu-target=\" + arch + \" --genco -Wno-ignored-attributes -O2 -o \" + path + \".out 1&gt;&amp;2\\\"\"}, 10);\n    return file_read((path + \".out\").c_str());\n  }<\/code><\/pre>\n\n\n\n<p>This means that the single-source programming model side isn&#8217;t accessible, giving instead a more restricted device code-only model reminiscent of hipRTC (aka AMD&#8217;s NVRTC clone, for runtime C++ device code compilation).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">This is it? Where is the ROCm HIP on Windows SDK?<\/h2>\n\n\n\n<p>There&#8217;s also <code>hiprtBuildTraceProgram<\/code> and <code>hiprtBuildTraceGetBinary<\/code> as part of HIP-RT.<\/p>\n\n\n\n<p>You might also try to use hipRTC&#8217;s runtime compilation features, an example of doing so is <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/retr0-13\/hashcat\/blob\/14f78d9910a103c996a0d8af6f969bb0640b3d6c\/src\/ext_hiprtc.c\" target=\"_blank\">present<\/a> for hashcat&#8217;s <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/retr0-13\/hashcat\/blob\/14f78d9910a103c996a0d8af6f969bb0640b3d6c\/src\/ext_hip.c\" target=\"_blank\">HIP backend<\/a>. Or AMD&#8217;s <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/GPUOpen-LibrariesAndSDKs\/Orochi\" target=\"_blank\">Orochi<\/a>. Those options don&#8217;t allow using the single-source nature expected of modern GPGPU APIs, requiring device code to be in separate files.<\/p>\n\n\n\n<p>This doesn&#8217;t give good options to users of AMD GPUs overall on the world&#8217;s most used desktop operating system for GPGPU use.<\/p>\n\n\n\n<p>The ROCm HIP limitations otherwise present apply, including the fact that binaries are hardware-specific instead of having an IR option compatible between GPU generations.<\/p>\n\n\n\n<p>(edit: added mentions of ROCm HIP support in <em>hashcat<\/em> and <em>Orochi<\/em>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft&#8217;s Antares code generator supports generating code for ROCm HIP on Windows among other targets. How does it compile and execute that code when the ROCm HIP SDK on Windows isn&#8217;t public at all, in a way accessible to everyone? The runtime side https:\/\/github.com\/microsoft\/antares\/blob\/v0.3.x\/backends\/c-rocm_win64\/include\/backend.hpp &#8211; MIT licensed Loading the runtime The ROCm DLL is loaded&hellip;&nbsp;<a href=\"https:\/\/threedots.ovh\/blog\/2022\/05\/rocm-hip-on-windows\/\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">ROCm HIP on Windows?<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-403","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/403","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/comments?post=403"}],"version-history":[{"count":11,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/403\/revisions"}],"predecessor-version":[{"id":415,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/403\/revisions\/415"}],"wp:attachment":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/media?parent=403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/categories?post=403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/tags?post=403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}