CUDA on NVIDIA Jetson – just not the same

Side note: OpenCL support doesn’t ship at all for NVIDIA Jetson platforms. As such, CUDA is the only available low-level GPGPU API there.

NVIDIA’s embedded Tegra SoCs – including Xavier – support CUDA. But that’s not the same CUDA as NVIDIA delivers elsewhere. Specific catches apply to using CUDA on the platform.

Supported CUDA versions

JetPack 4.6 ships with CUDA 10.2. This is the last version that Tegra X1 (used in Jetson TX1 and Jetson Nano) and Tegra X2 (used in Jetson TX2 and Jetson TX2 NX) will ever get.

JetPack 5.0 (currently in public preview) is the first version to ship with CUDA 11 support, specifically CUDA 11.4. This preview is supported on Xavier (used in Jetson AGX Xavier and Jetson Xavier NX) and later.

It isn’t possible to do an OTA upgrade via APT from JetPack 4.6 to 5.0, a full reflash of the Jetson device is required.

Container version incompatibility

The Jetson software distribution doesn’t guarantee that older Docker containers will run on newer operating system versions.

Before JetPack 5.0, NVIDIA used to mount the CUDA installation and libraries from the host instead of shipping them with the container.

As such, prior to JetPack 5.0 container images, GPU functionality doesn’t work when a newer host OS is used. This applies unless specific configurations with l4t-cuda or l4t-tensorrt base images were used.

L4T CUDA is different from SBSA CUDA

L4T (Linux for Tegra) CUDA is a different distribution from SBSA (Server Base System Architecture) CUDA. The latter is the regular distribution designed for use on systems with dedicated GPUs.

L4T CUDA maps to the aarch64-linux CUDA target. SBSA CUDA maps to the sbsa-linux CUDA target.

Binary compatibility

Unlike NVIDIA dGPUs, where a SASS binary is guaranteed to run on a later GPU minor revision, such a guarantee isn’t provided on Tegra.

An sm_70 GPU binary will not run on Xavier’s integrated GPU, a sm_72 binary is needed instead. However, an compute_70 PTX slice (or earlier) will be able to be compiled by the PTX JIT to run on Xavier’s iGPU.

Library compatibility

The sbsa-linux target doesn’t ship with SASS or PTX slices for NVIDIA Tegra iGPUs. As such, due to the limitation described above, while you might use such an SBSA CUDA toolkit on Xavier’s iGPU, CUDA-X libraries such as cuBLAS and cuFFT will not be functional when using the libraries shipped with it.

This scenario isn’t broken on Orin yet because that uses the latest GPU generation. As such, CUDA-X libraries do have a PTX compute_80 slice present, averting the problem for the time being. You might however have to replace the PTX JIT library if you’re using a later CUDA minor release than 11.4.

Shipping binaries supporting both platforms

If you intend to ship binaries that work on both Tegra (Xavier and later) and dGPU platforms, make sure to ship with an NVIDIA CUDA version supported by the two (CUDA 11.4) if you use CUDA-X libraries. You’ll have to ship separate sets of redistributable libraries for sbsa-linux and aarch64-linux and not statically link the CUDA-X libraries.

CUDA minor version compatibility

If you don’t rely on CUDA-X libraries, just recompile and go with any CUDA 11.x version if you’re only intending to support JetPack 5.0 and later. Don’t forget to include GPU binary slices for sm_72 and sm_87 in this scenario.

If you’re choosing CUDA 11.4 or earlier, another option is to include PTX slices of any revision older or equivalent to the Tegra GPUs you’re intending to support.

Or if you use a later CUDA 11.x release, replace the libnvidia-ptxjitcompiler.so.1 symlink in /usr/local/nvidia/lib64 with one from a later (SBSA) driver release to not interfere with the driver stack on the container host. This will allow to not have to include SASS slices for the Tegra GPU architectures you’re intending to use when using a later CUDA 11.x toolkit. This currently breaks the ptxjit sample, with the reason to be investigated.

Different drivers

The desktop and datacenter laptops use a (mostly) proprietary nvidia kernel driver. The Tegra platforms use an open-source GPU kernel-mode driver named nvgpu. The user space libraries are also different builds.

NVIDIA doesn’t support having those two stacks on the same system at the same time. There’s no CUDA ICD loader to support that scenario.

Different container paths

NVIDIA’s GPU containers on NGC are mutually incompatible between the L4T (Tegra) and dedicated GPU variants.

The driver libraries are stored in /usr/local/nvidia/lib64 for the sbsa-linux path, and in /usr/local/aarch64-linux-gnu/tegra for the L4T path.

You can manually create the /usr/local/nvidia/lib64 and fill it with the required symbolic links. You should then add dir, /usr/local/nvidia/lib64 to /etc/nvidia-container-runtime/host-files-for-container.d/drivers.csv.

You’ll then have working CUDA within the non L4T Docker containers, with the catches described above.

You will probably find that the software you’re trying to run does not have the appropriate binary slices, or will try to use CUDA-X libraries that don’t have slices usable on Tegra iGPUs when running CUDA for an SBSA target.

But I’m using a Jetson Nano, what should I do to use CUDA 11?

Your platform is no longer supported by newer Linux for Tegra releases. You’re officially stuck on CUDA 10.2 forever officially. For unofficial use, stay tuned…

(later edit: compute_[version] is for PTX slices, and sm_[version] is for SASS ones. Updated to clarify that you can use the PTX JIT from a newer driver and also described how CUDA-X libraries from SBSA builds currently work on Orin due to the presence of a PTX slice. Also noted that the CUDA-X libraries will have to not be statically linked to share binary builds between L4T and SBSA.)

3 thoughts on “CUDA on NVIDIA Jetson – just not the same”

Rekha Mukund November 11, 2022 at 12:44 pm


Do check this out!
https://developer.nvidia.com/blog/simplifying-cuda-upgrades-for-nvidia-jetson-users/
1. threedots November 12, 2022 at 3:50 pm
  
  
  Oh yeah that CUDA 11.8 update was very nice! Still split SBSA and Tegra CUDA distributions (for now?) though. But this change removes a lot of the previous issues.
2. 1. Rekha Mukund November 14, 2022 at 5:30 am
    
    
    For now, yes – the RMs are still different between the two stacks. That will change eventually when the stacks unify. Don’t yet have a date to share on that but its on the roadmap.
    And thank you! 🙂 We’re glad you find this feature useful.
    We did have a webinar on this talking about what’s coming up as well with the entire compute stack being made upgradable and JP picking up latest CUDA from JP 6.0 onwards. Do view it when you get a chance – https://wcc.on24.com/webcast/present?e=3968490&k=BB3447BAF3A89BE5CC95FE470EE9CDF1