Skip to content
Snippets Groups Projects

Draft: Prototype of hybrid MP/MT setup

Prototype of hybrid MP/MT setup. The main addition is a new --threadsPerProcess argument that should be set together with --multithreaded.

This MR also merges MT and MP handling to avoid too much duplicated code and potential inconsistencies. That's why I'd get this into main first and if we need it, we can backport to 24.0.

The main inconsistency that no longer works is using --multithreaded and --athenaopts='--nprocs=N' together successfully. Now always if --nprocs or --threads is used, the --multithreaded and --multiprocess flags are not used.

Also I fixed when all was passed after some substep arguments. While this would probably never happen in production it may avoid confusion when running locally.

I also extended the tests significantly.

TODO:

  • Better handle disabled MT or MP.
  • Error when --threadsPerProcess is used without --multithreaded.
  • Error when --multithreaded is used with --nprocs and the other way around.
  • Tests for concurrent events setting.

This MR may again be too ambitious but I did not like just hacking the MT parsing.

Tagging @amete, @fwinkl, @tsulaia, @akraszna, @jchapman, @jcatmore, @elmsheus.

Edited by Tadej Novak

Merge request reports

Members who can merge are allowed to add commits.

Pipeline #12936545 passed

Pipeline passed for a99e1629 on tadej:hybrid-mt-mp

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Frank Winklmeier resolved all threads

    resolved all threads

    • Thanks a lot @tadej. I'm still not 100% sure why we cannot directly pass the number of threads/processes through the job transform to the underlying job configuration. After all, we need two integers. With your proposal one is kept implicit (as it is now), through ATHENA_CORE_NUMBER and --multiprocess, and the other explicit, --threadsPerProcess. Wouldn't it be clearer if one would introduce something along the lines of:

      MyTransform.py --nprocesses "Step1:0, Step2:1, Step3: 2" --nthreads "Step1:2, Step2: 1, Step3: 0" [...]

      In which case Step1 would be AthenaMT, Step2 would be Hybrid, and Step3 would be AthenaMP. Then, one could pass whatever environment variables to these as they wish. We can provide backward compatibility for --multiprocess/--multithreaded for a while but eventually phase them out.

    • Author Developer

      You can not hardcode both of the parameters as you do not know what site you will run on in advance and sites may be different for the same production configuration.

    • Since we're opening the can of worms, why don't we do this properly? I understand it's a bit more work from the get go but I really believe it'll pay off in the long run.

      This syntax for setting a hybrid job is super confusing:

      [...] --multithreaded --threadsPerProcess Overlay:4 [...]

      In what I propose, you can achieve the same thing without hardcoding an integer literal like this:

      [...] --nprocesses "Overlay:${ATHENA_CORE_NUMBER}" --nthreads "Overlay:4"

      Actually, ideally we can remove the ambiguity on ATHENA_CORE_NUMBER, get rid of it and reintroduce ATHENA_PROC_NUMBER and introduce ATHENA_THREAD_NUMBER. Former can be used for the number of processes and the latter for the number of threads. We can even fold in the step information in these variables so that different number of core counts are support for chained workflows.

    • Author Developer

      How do you then set threading for reco here?

    • If you only want to run Overlay in MT/MP hybrid and all the rest in MT, I would imagine something along the lines of:

      [...] --nprocesses "Overlay:${ATHENA_CORE_NUMBER}" --nthreads "Overlay:4,default:${ATHENA_CORE_NUMBER}"

      would work (assuming --nprocesses is defaulted to 0 and you're OK using the same value for both MP workers in Overlay and MT threads elsewhere).

      If you want to assign separate values for MP workers and MT threads, you can introduce a new environment variable as I mentioned above, e.g., ATHENA_PROC_NUMBER (like we used to, and on the hindsight should probably have kept) and ATHENA_THREAD_NUMBER (new).

      If you want to go even deeper, you can change ATHENA_PROC_NUMBER and ATHENA_THREAD_NUMBER from simple number literals to something that also encodes the steps, e.g., export ATHENA_PROC_NUMBER="Step1:0,Step2:8" but that's probably an overkill.

    • Please register or sign in to reply
  • Tadej Novak added 13068 commits

    added 13068 commits

    Compare with previous version

  • Author Developer

    Jenkins please retry a build

  • This merge request affects 2 packages:

    • Simulation/Tests/OverlayTestsMT
    • Tools/PyJobTransforms

    Affected files list will not be printed in this case

    Adding @tkharlam ,@ahaas ,@jchapman ,@tadej as watchers

  • :warning: WARNING: big files (>100K) are found in the changeset

    :pencil: 128K in file Tools/PyJobTransforms/python/trfArgClasses.py

    :pencil: 116K in file Tools/PyJobTransforms/python/trfExe.py

  • :white_check_mark: CI Result SUCCESS (hash a99e1629)

    Athena AthSimulation AthGeneration
    externals :white_check_mark: :white_check_mark: :white_check_mark:
    cmake :white_check_mark: :white_check_mark: :white_check_mark:
    make :white_check_mark: :white_check_mark: :white_check_mark:
    tests :white_check_mark: :white_check_mark: :white_check_mark:

    Full details available on this CI monitor view. Check the JIRA CI status board for known problems
    :white_check_mark: Athena: number of compilation errors 0, warnings 0
    :white_check_mark: AthSimulation: number of compilation errors 0, warnings 0
    :white_check_mark: AthGeneration: number of compilation errors 0, warnings 0
    :pencil: For experts only: Jenkins output (remote access info)

  • Please register or sign in to reply
    Loading