Binary ONNXRuntime Installation, main branch (2024.03.08.)
The update to CUDA 12.4 (SPI-2517, SPI-2524, ATLINFR-5280) is turning into the last straw with ONNXRuntime for me.
So I thought I'd try something completely different. Microsoft builds ONNXRuntime for a few platforms itself. See:
https://github.com/microsoft/onnxruntime/releases
Also, Microsoft is actually pretty good with only exposing symbols from its binaries that absolutely need to be exposed. So one of the arguments for building ONNXRuntime ourselves, that we need to make sure that it would use the same version of Eigen, etc. that we use from LCG, is absolutely moot.
So... why not just do this? I'll need to do a proper test with this in a full Athena build before we could be serious about this proposal, but I think it could work. The binaries provided by Microsoft seem to feel happy in our environment.
[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime.so
linux-vdso.so.1 (0x00007ffc13d3d000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fda17cdd000)
librt.so.1 => /lib64/librt.so.1 (0x00007fda17cd6000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fda17cd1000)
libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fda16800000)
libm.so.6 => /lib64/libm.so.6 (0x00007fda17bf6000)
libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fda17bd1000)
libc.so.6 => /lib64/libc.so.6 (0x00007fda16400000)
/lib64/ld-linux-x86-64.so.2 (0x00007fda17cf8000)
[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_shared.so
linux-vdso.so.1 (0x00007ffcfff50000)
libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fbca9c00000)
libm.so.6 => /lib64/libm.so.6 (0x00007fbca9f25000)
libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fbcaa234000)
libc.so.6 => /lib64/libc.so.6 (0x00007fbca9800000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbcaa271000)
[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_cuda.so
linux-vdso.so.1 (0x00007ffc4171d000)
libcublasLt.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcublasLt.so.12 (0x00007fb6b2800000)
libcublas.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcublas.so.12 (0x00007fb6abc00000)
libcudnn.so.8 => /software/cudnn/8.9.7/x86_64-cuda12/lib/libcudnn.so.8 (0x00007fb6ab9dc000)
libcurand.so.10 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcurand.so.10 (0x00007fb6a5400000)
libcufft.so.11 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcufft.so.11 (0x00007fb693800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb6eda49000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb6eda44000)
libcudart.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcudart.so.12 (0x00007fb693400000)
libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fb693000000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb6ed967000)
libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fb6ed942000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb692c00000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb6eda66000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb6ed93d000)
undefined symbol: Provider_GetHost (./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_cuda.so)
[bash][atspot01]:build >
The last complaint about the missing Provider_GetHost
symbol is a bit baffling though. That symbol is provided by libonnxruntime_providers_shared.so
. But for some reason libonnxruntime_providers_cuda.so
does not link explicitly against that shared library.
But other than that, the 1.17.1 ONNXRuntime binaries seem happy with a combination of CUDA 12.4 + cuDNN 8.9.7. (Even with the downloaded binary we can't use cuDNN 9.0.0!)