Skip to content
Snippets Groups Projects

[ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change

Merged [ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change
All threads resolved!
Merged Tim Martin requested to merge tamartin/athena:tdt_tests_2 into master
All threads resolved!

Further optimisations to minimise CPU time in the 1st event in high-MT jobs (which need to run this once per slot)

Prevents STREAM and GROUP built-in ChainGroups from being computed twice on config update, here the second call was using RegEx which is also bad for these groups.

Merge request reports

Pipeline #2619716 passed

Pipeline passed for a96883ff on tamartin:tdt_tests_2

Approval is optional

Merged by Edward MoyseEdward Moyse 4 years ago (May 21, 2021 9:31am UTC)

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • This merge request affects 1 package:

    • Trigger/TrigAnalysis/TrigDecisionTool

    This merge request affects 1 file:

    • Trigger/TrigAnalysis/TrigDecisionTool/Root/CacheGlobalMemory.cxx

    Adding @tamartin as watcher

  • :white_check_mark: CI Result SUCCESS (hash a96883ff)

    Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon
    externals :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
    cmake :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
    make :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
    required tests :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
    optional tests :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:

    Full details available on this CI monitor view
    :white_check_mark: Athena: number of compilation errors 0, warnings 0
    :white_check_mark: AthSimulation: number of compilation errors 0, warnings 0
    :white_check_mark: AthGeneration: number of compilation errors 0, warnings 0
    :white_check_mark: AnalysisBase: number of compilation errors 0, warnings 0
    :white_check_mark: AthAnalysis: number of compilation errors 0, warnings 0
    :white_check_mark: DetCommon: number of compilation errors 0, warnings 0
    :pencil: For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 33769]

  • I tested the change this morning. It does improve the situation quite a bit! :tada:

    TDT-hostspots3

    The Boost regular expression code is still pretty dominant at the beginning of my 64-thread test job, but not as much as it was without this update. So I'm fully on board. :thumbsup:

  • Author Developer

    OK - so beginEvent on the 1st event goes from around 4 mins of 64 cores at 100% to around 70s with mixed periods of work and idle. Certainly more sane.

    Though the TDT is still the largest user it seems here (though in a longer job it becomes less significant).

    @bwynne was having a look at improving the performance of the existing fn. a bit more with some further optimizations.

    More serious optimisations would make sense to me as part of a larger TDT backend rewrite, something which I have been trying to avoid. But may need to happen at some point.

    Edited by Tim Martin
    • Resolved by Attila Krasznahorkay

      I do think that there's something about this code that we don't understand though. :thinking:

      With fewer threads the same per-thread initialisation takes less time. (I'll make some tests for this actually. I've not tested <64 thread jobs in quite a while.) This is why I assume(d) that there's memory contention between the threads all doing the same memory intensive operation at the same time.

      But as I wrote in ATR-23427 yesterday, when I forced these operations to execute one-by-one, they still took very long times, even individually. Which practically lead to the job not even exiting this initialisation phase during the time that I've set for the profile collection.

      So I suspect that with many threads we somehow increase the amount of operations as well, it's not just a matter of threads stepping on each others' toes. But I don't know the TDT code well enough to be able to give a guess to how that could be... :frowning2:

  • Attila Krasznahorkay resolved all threads

    resolved all threads

  • mentioned in merge request !43664 (closed)

  • Let me ping @emoyse on it, just to get things moving. :stuck_out_tongue:

    As a reminder, without this update the amount of time spent in setting up the TDT's internal data structures during the first event is like this:

    TDT-hostspots5

    I.e. it takes almost an order of magnitude more CPU time to do it... (The difference makes my 500 second test job go from being able to process just 50 events with the current version of master, to processing almost 500 in the same amount of time!)

  • merged

  • Edward Moyse mentioned in commit 79a0b101

    mentioned in commit 79a0b101

Please register or sign in to reply
Loading