Backend Configuration
HadaMAG can run its kernels on different execution engines depending on your hardware and environment. We call these backends.
There are five available backends:
Serial– single-threaded CPU execution.Threaded– multi-threaded CPU execution (uses JuliaThreads).MPIThreads– hybrid MPI + threads execution (requiresMPI.jl).CUDA– GPU execution using CUDA (requiresCUDA.jl). Only available forSREfunction at the moment.MPI_CUDA– hybrid MPI + GPU execution, for multiple nodes with GPUs (requires bothMPI.jlandCUDA.jl). Only available forSREfunction at the moment.
You choose the backend with the backend keyword in user-facing functions, e.g.:
julia> SRE(ψ, q; backend = :auto) # default is :autoBackend types and symbols
Internally we define three backend types:
abstract type AbstractBackend end
struct Serial <: AbstractBackend end
struct Threaded <: AbstractBackend end
struct MPIThreads <: AbstractBackend end
struct CUDAThreads <: AbstractBackend end
struct MPICUDAThreads <: AbstractBackend endThese map to the following keyword symbols:
backend = :serial$\to$Serial().backend = :threads$\to$Threaded().backend = :mpi$\to$MPIThreads().backend = :cuda$\to$CUDAThreads().backend = :mpi_cuda$\to$MPICUDAThreads().backend = :auto(default) $\to$ automatic selection.
You can force an specific backend by passing the corresponding symbol to user functions:
julia> using HadaMAG
julia> ψ = rand_haar(8; depth=2)
StateVec{ComplexF64,2}(n=8, dim=256, mem=4.04 KiB)
julia> SRE(ψ, 2; backend = :threads)
[==================================================] 100.0% (256/256)
(3.7603466770760265, 1.3322676295501878e-15)MPI backend
Julia’s package extensions let us ship MPI code without hard-requiring MPI for everyone. Instead, The extension HadaMAGMPIExt is automatically loaded and activated when MPI.jl is loaded in your session.
To use it, you just need to add and load MPI.jl:
julia> using HadaMAG
julia> using Pkg; Pkg.add("MPI"); using MPI
Precompiling HadaMAGMPIExt...
1 dependency successfully precompiled in 2 seconds. 341 already precompiled.Configuring MPI implementation
The MPI.jl package uses MPIPreferences.jl to decide which MPI implementation to load (a system MPI or a JLL/bundled MPI).
- Use system MPI (e.g., OpenMPI or MPICH on a cluster):
julia> using MPIPreferences
julia> MPIPreferences.MPIPreferences.use_system_binary()This will use the system MPI installation.
- Or use a bundled MPI (e.g.,
OpenMPI_jll):
julia> using MPIPreferences
julia> MPIPreferences.MPIPreferences.use_jll_binary()You can check which MPI you’re using with:
julia> using MPI
julia> MPI.identify_implementation()
("MPICH", v"4.3.1")Running using MPI
Here we show a minimal example of running HadaMAG with MPI on a cluster or laptop. Create a file run_mpi.jl with the following content:
using HadaMAG
using MPI
using Random
MPI.Initialized() || MPI.Init()
comm = MPI.COMM_WORLD; rank = MPI.Comm_rank(comm)
L = 16
Random.seed!(123) # Fix seed so all ranks generate the same state. You could also generate a state on rank 0 and broadcast it.
ψ = rand_haar(L; depth=5)
m2, lost_norm = SRE(ψ, 2; backend=:mpi_threads, progress=false)
# Only print from rank 0
if rank == 0
println("SRE(ψ, 2) = ", m2)
println("Lost norm: ", lost_norm)
endThen run it with mpiexec or mpirun. For example, to run with 4 processes:
mpirun -n 4 julia --project yourproject run_sre.jlOr if you are on a cluster with SLURM, you can submit a job script like this:
srun --ntasks=4 --cpus-per-task=1 julia --project yourproject run_sre.jlCUDA backend
The CUDA backend is only available for the SRE function at the moment. To use it, you just need to add and load CUDA.jl into your session:
julia> using HadaMAG
julia> using Pkg; Pkg.add("CUDA"); using CUDAThen you can call SRE with backend = :cuda, and you can choose batch and threads parameters to optimize performance depending on your GPU hardware. Here is a minimal example:
julia> using HadaMAG
julia> using CUDA
julia> ψ = rand_haar(12; depth=4)
StateVec{ComplexF64,2}(n=12, dim=4096, mem=64.00 KiB)
batch_size = nthreads_per_device = 128 # Adjust these parameters depending on your GPU
128
julia> SRE(ψ, 2; backend = :cuda, progress=false, batch=batch_size, threads=nthreads_per_device)
[==================================================] 100.0% (4096/4096)
(8.019371115855193, 3.774758283725532e-15)Multi-GPU with MPI + CUDA
HadaMAG.jl also supports hybrid MPI + CUDA execution for multi-GPU systems. The MPI + CUDA backend is only available for the SRE function at the moment. To use it, you need to have both MPI.jl and CUDA.jl loaded in your session. After that, you can call SRE with backend = :mpi_cuda, and use it similarly to the MPI example above:
using HadaMAG
using MPI
using Random
MPI.Initialized() || MPI.Init()
comm = MPI.COMM_WORLD; rank = MPI.Comm_rank(comm)
L = 16
Random.seed!(123) # Fix seed so all ranks generate the same state. You could also generate a state on rank 0 and broadcast it.
ψ = rand_haar(L; depth=5)
batch_size = nthreads_per_device = 128 # Adjust these parameters depending on your GPU
m2, lost_norm = SRE(ψ, 2; backend=:mpi_cuda, progress=false, batch=batch_size, threads=nthreads_per_device)
# Only print from rank 0
if rank == 0
println("SRE(ψ, 2) = ", m2)
println("Lost norm: ", lost_norm)
end