Skip to content

Binary ONNXRuntime Installation, main branch (2024.03.08.)

The update to CUDA 12.4 (SPI-2517, SPI-2524, ATLINFR-5280) is turning into the last straw with ONNXRuntime for me. 😦 Once again we'd need to sort out how to build that code successfully using GCC 13, in C++20 mode. Because the latest version of it, which we'd have to update to to support CUDA 12.4, does not build out of the box with our version of GCC.

So I thought I'd try something completely different. Microsoft builds ONNXRuntime for a few platforms itself. See:

https://github.com/microsoft/onnxruntime/releases

Also, Microsoft is actually pretty good with only exposing symbols from its binaries that absolutely need to be exposed. So one of the arguments for building ONNXRuntime ourselves, that we need to make sure that it would use the same version of Eigen, etc. that we use from LCG, is absolutely moot. 🤔

So... why not just do this? I'll need to do a proper test with this in a full Athena build before we could be serious about this proposal, but I think it could work. The binaries provided by Microsoft seem to feel happy in our environment.

[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime.so
	linux-vdso.so.1 (0x00007ffc13d3d000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fda17cdd000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fda17cd6000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fda17cd1000)
	libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fda16800000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fda17bf6000)
	libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fda17bd1000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fda16400000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fda17cf8000)
[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_shared.so 
	linux-vdso.so.1 (0x00007ffcfff50000)
	libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fbca9c00000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fbca9f25000)
	libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fbcaa234000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fbca9800000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbcaa271000)
[bash][atspot01]:build > ./CMakeFiles/atlas_build_run.sh ldd -r ./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_cuda.so 
	linux-vdso.so.1 (0x00007ffc4171d000)
	libcublasLt.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcublasLt.so.12 (0x00007fb6b2800000)
	libcublas.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcublas.so.12 (0x00007fb6abc00000)
	libcudnn.so.8 => /software/cudnn/8.9.7/x86_64-cuda12/lib/libcudnn.so.8 (0x00007fb6ab9dc000)
	libcurand.so.10 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcurand.so.10 (0x00007fb6a5400000)
	libcufft.so.11 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcufft.so.11 (0x00007fb693800000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb6eda49000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fb6eda44000)
	libcudart.so.12 => /cvmfs/sft.cern.ch/lcg/releases/cuda/12.4-4899e/x86_64-el9-gcc13-opt/lib64/libcudart.so.12 (0x00007fb693400000)
	libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libstdc++.so.6 (0x00007fb693000000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fb6ed967000)
	libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/lib64/libgcc_s.so.1 (0x00007fb6ed942000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fb692c00000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb6eda66000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fb6ed93d000)
undefined symbol: Provider_GetHost	(./x86_64-el9-gcc13-opt/lib/libonnxruntime_providers_cuda.so)
[bash][atspot01]:build >

The last complaint about the missing Provider_GetHost symbol is a bit baffling though. That symbol is provided by libonnxruntime_providers_shared.so. But for some reason libonnxruntime_providers_cuda.so does not link explicitly against that shared library. 😕 So we'll need to set up the linking correctly for our own code. 🤔

But other than that, the 1.17.1 ONNXRuntime binaries seem happy with a combination of CUDA 12.4 + cuDNN 8.9.7. (Even with the downloaded binary we can't use cuDNN 9.0.0!)

Pinging @elmsheus and @dbakshig.

Merge request reports