Draft: Prototype of hybrid MP/MT setup
Prototype of hybrid MP/MT setup. The main addition is a new --threadsPerProcess
argument that should be set together with --multithreaded
.
This MR also merges MT and MP handling to avoid too much duplicated code and potential inconsistencies. That's why I'd get this into main first and if we need it, we can backport to 24.0.
The main inconsistency that no longer works is using --multithreaded
and --athenaopts='--nprocs=N'
together successfully. Now always if --nprocs
or --threads
is used, the --multithreaded
and --multiprocess
flags are not used.
Also I fixed when all
was passed after some substep arguments. While this would probably never happen in production it may avoid confusion when running locally.
I also extended the tests significantly.
TODO:
- Better handle disabled MT or MP.
- Error when
--threadsPerProcess
is used without--multithreaded
. - Error when
--multithreaded
is used with--nprocs
and the other way around. - Tests for concurrent events setting.
This MR may again be too ambitious but I did not like just hacking the MT parsing.
Tagging @amete, @fwinkl, @tsulaia, @akraszna, @jchapman, @jcatmore, @elmsheus.
Merge request reports
Activity
added full-unit-tests label
added 858 commits
-
5aa739d0...3d24a451 - 857 commits from branch
atlas:main
- 9001e121 - Prototype of hybrid MP/MT setup
-
5aa739d0...3d24a451 - 857 commits from branch
added Overlay Simulation Tools main labels
CI Result FAILURE (hash 9001e121)Athena AthSimulation AthGeneration externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
For experts only: Jenkins output (remote access info)- Resolved by Frank Winklmeier
Thanks a lot @tadej. I'm still not 100% sure why we cannot directly pass the number of threads/processes through the job transform to the underlying job configuration. After all, we need two integers. With your proposal one is kept implicit (as it is now), through
ATHENA_CORE_NUMBER
and--multiprocess
, and the other explicit,--threadsPerProcess
. Wouldn't it be clearer if one would introduce something along the lines of:MyTransform.py --nprocesses "Step1:0, Step2:1, Step3: 2" --nthreads "Step1:2, Step2: 1, Step3: 0" [...]
In which case
Step1
would beAthenaMT
,Step2
would beHybrid
, andStep3
would beAthenaMP
. Then, one could pass whatever environment variables to these as they wish. We can provide backward compatibility for--multiprocess
/--multithreaded
for a while but eventually phase them out.Since we're opening the can of worms, why don't we do this properly? I understand it's a bit more work from the get go but I really believe it'll pay off in the long run.
This syntax for setting a hybrid job is super confusing:
[...] --multithreaded --threadsPerProcess Overlay:4 [...]
In what I propose, you can achieve the same thing without hardcoding an integer literal like this:
[...] --nprocesses "Overlay:${ATHENA_CORE_NUMBER}" --nthreads "Overlay:4"
Actually, ideally we can remove the ambiguity on
ATHENA_CORE_NUMBER
, get rid of it and reintroduceATHENA_PROC_NUMBER
and introduceATHENA_THREAD_NUMBER
. Former can be used for the number of processes and the latter for the number of threads. We can even fold in the step information in these variables so that different number of core counts are support for chained workflows.If you only want to run Overlay in MT/MP hybrid and all the rest in MT, I would imagine something along the lines of:
[...] --nprocesses "Overlay:${ATHENA_CORE_NUMBER}" --nthreads "Overlay:4,default:${ATHENA_CORE_NUMBER}"
would work (assuming
--nprocesses
is defaulted to0
and you're OK using the same value for both MP workers in Overlay and MT threads elsewhere).If you want to assign separate values for MP workers and MT threads, you can introduce a new environment variable as I mentioned above, e.g.,
ATHENA_PROC_NUMBER
(like we used to, and on the hindsight should probably have kept) andATHENA_THREAD_NUMBER
(new).If you want to go even deeper, you can change
ATHENA_PROC_NUMBER
andATHENA_THREAD_NUMBER
from simple number literals to something that also encodes the steps, e.g.,export ATHENA_PROC_NUMBER="Step1:0,Step2:8"
but that's probably an overkill.
added 13068 commits
-
9001e121...4d21f5d0 - 13067 commits from branch
atlas:main
- a99e1629 - Prototype of hybrid MP/MT setup
-
9001e121...4d21f5d0 - 13067 commits from branch
CI Result SUCCESS (hash a99e1629)Athena AthSimulation AthGeneration externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
For experts only: Jenkins output (remote access info)