{"id":291,"date":"2022-03-29T11:56:52","date_gmt":"2022-03-29T09:56:52","guid":{"rendered":"https:\/\/threedots.ovh\/blog\/?p=291"},"modified":"2022-03-29T11:58:26","modified_gmt":"2022-03-29T09:58:26","slug":"opencl-on-metal-1-get_global_id","status":"publish","type":"post","link":"https:\/\/threedots.ovh\/blog\/2022\/03\/opencl-on-metal-1-get_global_id\/","title":{"rendered":"OpenCL on Metal #1 &#8211; get_global_id()"},"content":{"rendered":"\n<p>OpenCL today is considered as deprecated on macOS, let&#8217;s see if we can get OpenCL kernels compiled to AIR to run on Metal.<\/p>\n\n\n\n<p>Let&#8217;s start with the prototype of a basic kernel to add array A and B to an output array named result in both MSL and OpenCL C.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Prototype<\/h4>\n\n\n\n<p><strong>Metal<\/strong>: void add_arrays(device const float* inA, device const float* inB, device float* result, uint index [[thread_position_in_grid]])<\/p>\n\n\n\n<p><strong>OpenCL<\/strong>: __kernel void add_arrays(__global const float *a, __global const float *b, __global float *result)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Let&#8217;s pass a precompiled to AIR OpenCL binary to Metal and see what happens?<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>Compute pipeline state creation failed.\nError: Undefined symbols: llvm.agx2.thread.position.in.grid.A, referenced from: __entry_add_arrays<\/code><\/pre>\n\n\n\n<p>llvm.agx2.thread.position.in.grid.A is not available.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>; Function Attrs: nounwind optsize readnone\ndefine hidden i64 @_Z13get_global_idj(i32) local_unnamed_addr #0 {\n  %2 = tail call i32 @air.get_global_id.i32(i32 %0) #1\n  %3 = zext i32 %2 to i64\n  ret i64 %3\n}\n\n; Function Attrs: nounwind readnone\ndeclare i32 @air.get_global_id.i32(i32) local_unnamed_addr #1\n<\/code><\/pre>\n\n\n\n<p>What does that map to? It maps to air.get_global_id.i32. Which is called from get_global_id(int dim).<\/p>\n\n\n\n<p>Metal doesn\u2019t have an equivalent to get_global_id(), it passes through the index to the compute kernel.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">How is it done in AIR for Metal?<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>; Function Attrs: norecurse nounwind\ndefine void @add_arrays(float addrspace(1)* noalias nocapture readonly, float addrspace(1)* noalias nocapture readonly, float addrspace(1)* noalias nocapture, i32) local_unnamed_addr #0 {\n  %5 = zext i32 %3 to i64\n  %6 = getelementptr inbounds float, float addrspace(1)* %0, i64 %5\n  %7 = load float, float addrspace(1)* %6, align 4, !tbaa !22\n  %8 = getelementptr inbounds float, float addrspace(1)* %1, i64 %5\n  %9 = load float, float addrspace(1)* %8, align 4, !tbaa !22\n  %10 = fadd fast float %9, %7\n  %11 = getelementptr inbounds float, float addrspace(1)* %2, i64 %5\n  store float %10, float addrspace(1)* %11, align 4, !tbaa !22\n  ret void\n}\n\n!8 = !{void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*, i32)* @add_arrays, !9, !10}\n!9 = !{}\n!10 = !{!11, !12, !13, !14}\n!11 = !{i32 0, !\"air.buffer\", !\"air.location_index\", i32 0, i32 1, !\"air.read\", !\"air.arg_type_size\", i32 4, !\"air.arg_type_align_size\", i32 4, !\"air.arg_type_name\", !\"float\", !\"air.arg_name\", !\"inA\"}\n!12 = !{i32 1, !\"air.buffer\", !\"air.location_index\", i32 1, i32 1, !\"air.read\", !\"air.arg_type_size\", i32 4, !\"air.arg_type_align_size\", i32 4, !\"air.arg_type_name\", !\"float\", !\"air.arg_name\", !\"inB\"}\n!13 = !{i32 2, !\"air.buffer\", !\"air.location_index\", i32 2, i32 1, !\"air.read_write\", !\"air.arg_type_size\", i32 4, !\"air.arg_type_align_size\", i32 4, !\"air.arg_type_name\", !\"float\", !\"air.arg_name\", !\"result\"}\n!14 = !{i32 3, !\"air.thread_position_in_grid\", !\"air.arg_type_name\", !\"uint\", !\"air.arg_name\", !\"index\"}\n<\/code><\/pre>\n\n\n\n<p>As expected in the function prototype, the index is the 4th argument, and is passed that way to the kernel.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Strategies around this<\/h4>\n\n\n\n<p>There&#8217;s three possible paths forward. <\/p>\n\n\n\n<p>The first path is figuring out what the OpenCL implementation does to enable use of get_global_id() during code generation.<\/p>\n\n\n\n<p>The second strategy is writing a compiler pass to have this information passed when needed throughout the program, as an additional argument.<\/p>\n\n\n\n<p>The third one is not using Apple&#8217;s compiler frontend, and going our own way. For that, an OpenCL-profile SPIR-V to AIR path might be preferable engineering-wise.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenCL today is considered as deprecated on macOS, let&#8217;s see if we can get OpenCL kernels compiled to AIR to run on Metal. Let&#8217;s start with the prototype of a basic kernel to add array A and B to an output array named result in both MSL and OpenCL C. Prototype Metal: void add_arrays(device const&hellip;&nbsp;<a href=\"https:\/\/threedots.ovh\/blog\/2022\/03\/opencl-on-metal-1-get_global_id\/\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">OpenCL on Metal #1 &#8211; get_global_id()<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-291","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/comments?post=291"}],"version-history":[{"count":7,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/291\/revisions"}],"predecessor-version":[{"id":298,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/posts\/291\/revisions\/298"}],"wp:attachment":[{"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/media?parent=291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/categories?post=291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/threedots.ovh\/blog\/wp-json\/wp\/v2\/tags?post=291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}