06 June, 2024

What’s New in CUDA.jl 5.4 for Julia?

The CUDA.jl package is the main programming interface for working with NVIDIA CUDA GPUs using Julia. It features a user-friendly array abstraction, a compiler for writing CUDA kernels in Julia, and wrappers for various CUDA libraries.

The recent update of CUDA.jl 5.4 features many memory-management enhancements that help improve performance of memory-heavy applications and facilitate the use of heterogeneous set-ups with multiple GPUs, or those using both CPU and GPU.

CUDA.jl v5.4 should be compatible with existing codebases as it bumps only the minor version.The bulk of the release focuses on memory management features. Let’s look at highlights of this release:

Eager garbage collection

Since Julia is a garbage collected language, GPU allocations can fail if garbage piles up, necessitating a collection cycle. Earlier versions of CUDA.jl managed this at the allocation site, detecting out-of-memory errors and triggering the garbage collection. This was far from ideal as it could lead to significant pauses and bloated memory usage.

To mitigate this issue, CUDA.jl v5.4 tracks memory usage more accurately, using the information to trigger the GC early on. For example, while waiting for a kernel to finish. This has the dual advantage of distributing the cost of garbage collection over time, and by potentially masking it behind other operations, ultimately leading to more predictable performance.

Tracked memory allocations

When using multiple GPUs, it is important to differentiate between the device that memory was allocated on, and the device used to execute code.
Now, CUDA.jl 5.4 tracks that device that owns the memory, and the stream last used to access it. This allows the package to handle memory usage in kernels or library functions correctly while keeping the user in control. It is especially valuable when using multiple GPUs, or when using multiple streams to more effectively use individual GPUs.

Unified memory iteration

In CUDA, unified memory allows memory to be accessed from both the CPU and the GPU. CUDA.jl 5.4 greatly improves the performance of using unified memory with CPU code. This feature is useful for incrementally porting code to the GPU without worrying about the performance of accessing memory from the CPU.

Other notable changes

CUDA.jl v5.4 includes many other changes, such as:

Initial support for automatic differentiation of heterogeneous host/device code using Enzyme.jl
CUDA.@profile now automatically detects external profiler
Improvements in exception output
Improved handling of cached library handles under memory pressure
Tegra devices are now supported by our artifacts
Support for CUDA 12.5 has been added, as well as initial support for Julia 1.12

You can access the full feature list and enhancements here.

JuliaHub

About the Author

Dr. Tim Besard

Discover Dr. Tim Besard’s contributions to JuliaHub. Learn about GPU computing and high-performance code generation.

What’s New in CUDA.jl 5.4 for Julia?

About the Author

Dr. Tim Besard

Recent blog posts

Learn More

Want to learn more about our capabilities? We are here to help.

What’s New in CUDA.jl 5.4 for Julia?

About the Author

Dr. Tim Besard

Recent blog posts

AI from elite motorsport: Binnies, Williams Grand Prix Technologies and JuliaHub announce groundbreaking partnership to bring scientific machine learning to the UK water sector for the first time

Learn More

Want to learn more about our capabilities? We are here to help.