Allow CUDA compilation for onnxruntime

AE Build SUCCESS
Build logfiles are available at Jenkins [AE-MERGE-REQUEST-CC7 #536]

added 2 commits

1730dad7 - Added FindCUDAToolkit.cmake and FindcuDNN.cmake to AtlasLCG.
1a90f7a4 - Updated onnxruntime to be able to build against LCG_101cuda, with CUDA support turned on.

So... This is not as simple of a problem as I first thought.

Currently in our main nightly we set up CUDA "by hand", in the same way in which we set up the compiler as well. Completely separately from which LCG version we may want to use, relying on AtlasSetup to do this for us. (Since CMake treats nvcc the same as any C++ compiler these days, this does make sense from our side.)

However, as I had to realise, SFT provides cuDNN, and also CUDA, as "regular packages".

The latter is only provided in this way, which also makes perfect sense.

Now... as you may see from those webpages, these two are only provided in the "CUDA LCG releases" at the moment. So to build onnxruntime with CUDA support turned on, I was using LCG_101cuda. Like:

cmake -DCMAKE_BUILD_TYPE=Release -DLCG_VERSION_NUMBER=101 -DLCG_VERSION_POSTFIX=cuda -DCTEST_USE_LAUNCHERS=TRUE ../atlasexternals/Projects/AthenaExternals/

With the updates that I now added to the MR, the build does succeed like this. But of course in our regular nightly cuDNN will not be available at the moment. Not unless we ask for its inclusion into the ATLAS layers...

@elmsheus, @emoyse, what do you think? Would it be outrageous to ask for the CUDA and cuDNN packages to be included into let's say LCG_101_ATLAS_3? Once they are, we may very well want to also re-think how AtlasSetup would handle CUDA. But that will be a separate discussion...

Hi @akraszna,

asking for these 2 packages to be added to a new layer LCG_101_ATLAS_3 sounds fine - I don't know if there might be technical reasons from the SFT side not to include them, though - will you open a SPI jira ticket ? N.B. to this new layer valgrind for gcc11 should be added as discussed in https://sft.its.cern.ch/jira/browse/SPI-1992

Cheers, Johannes

AE Build SUCCESS
Build logfiles are available at Jenkins [AE-MERGE-REQUEST-CC7 #541]

Here it is: https://sft.its.cern.ch/jira/browse/SPI-1996

added 24 commits

1a90f7a4...f2bbee2e - 20 commits from branch master
83198b50 - allow cuda compilation for ort
8730cb03 - remove debug line
31a52746 - Added FindCUDAToolkit.cmake and FindcuDNN.cmake to AtlasLCG.
a48bdd97 - Updated onnxruntime to be able to build against LCG_101cuda, with CUDA support turned on.

Compare with previous version

added 1 commit

f07ff00d - Updated onnxruntime to be able to build against LCG_101cuda, with CUDA support turned on.

Compare with previous version

Hi @akraszna ,

Thanks a lot.

Debo.

Unfortunately onnxruntime takes bloody forever to build with CUDA support turned on. It churns for O(10 minutes) in building CUDA code in a single-threaded way.

Could you guys check if any newer version addresses this? Since there are many newer versions than 1.5.1 by now.

For the nightly this is not necessarily a dealbreaker. But since on my own machine I can build AthenaExternals in <5 minutes, this is very noticable.

Hi @akraszna ,

Will do asap and let you know here.

Thanks, Debo.

AE Build SUCCESS
Build logfiles are available at Jenkins [AE-MERGE-REQUEST-CC7 #554]

AE Build SUCCESS
Build logfiles are available at Jenkins [AE-MERGE-REQUEST-CC7 #555]

Hi @akraszna ,

Don't see any specific issue but just created one https://github.com/microsoft/onnxruntime/issues/9627 and conveyed to some known ORT developers. Meanwhile, I will check updating the ORT if the issue persists.

Thanks, Debo.

Hi @akraszna

Can we set a CMAKE_BUILD_PARALLEL_LEVEL here https://gitlab.cern.ch/atlas/atlasexternals/-/blob/master/External/onnxruntime/CMakeLists.txt#L23 ?

Thanks, Debo

Okay, let's just go ahead with this one. See my comments in https://github.com/microsoft/onnxruntime/issues/9627 about the not-completely-ideal build performance...

mentioned in commit 1f303f75

merged

mentioned in merge request !885 (merged)

Allow CUDA compilation for onnxruntime

Merged by Attila Krasznahorkay 3 years ago (Nov 1, 2021 2:03pm UTC) 3 years ago

Activity

Allow CUDA compilation for onnxruntime

Merge request reports

Merged by Attila Krasznahorkay 3 years ago (Nov 1, 2021 2:03pm UTC) 3 years ago

Activity