[ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change
Further optimisations to minimise CPU time in the 1st event in high-MT jobs (which need to run this once per slot)
Prevents STREAM and GROUP built-in ChainGroups from being computed twice on config update, here the second call was using RegEx which is also bad for these groups.
Merge request reports
Activity
This merge request affects 1 package:
- Trigger/TrigAnalysis/TrigDecisionTool
This merge request affects 1 file:
- Trigger/TrigAnalysis/TrigDecisionTool/Root/CacheGlobalMemory.cxx
Adding @tamartin as watcher
added Trigger analysis-review-required master review-pending-level-1 labels
CI Result SUCCESS (hash a96883ff)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon externals cmake make required tests optional tests Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 33769]added review-approved label and removed review-pending-level-1 label
OK - so
beginEvent
on the 1st event goes from around 4 mins of 64 cores at 100% to around 70s with mixed periods of work and idle. Certainly more sane.Though the TDT is still the largest user it seems here (though in a longer job it becomes less significant).
@bwynne was having a look at improving the performance of the existing fn. a bit more with some further optimizations.
More serious optimisations would make sense to me as part of a larger TDT backend rewrite, something which I have been trying to avoid. But may need to happen at some point.
Edited by Tim Martin- Resolved by Attila Krasznahorkay
I do think that there's something about this code that we don't understand though.
With fewer threads the same per-thread initialisation takes less time. (I'll make some tests for this actually. I've not tested <64 thread jobs in quite a while.) This is why I assume(d) that there's memory contention between the threads all doing the same memory intensive operation at the same time.
But as I wrote in ATR-23427 yesterday, when I forced these operations to execute one-by-one, they still took very long times, even individually. Which practically lead to the job not even exiting this initialisation phase during the time that I've set for the profile collection.
So I suspect that with many threads we somehow increase the amount of operations as well, it's not just a matter of threads stepping on each others' toes. But I don't know the TDT code well enough to be able to give a guess to how that could be...
mentioned in merge request !43664 (closed)
Let me ping @emoyse on it, just to get things moving.
As a reminder, without this update the amount of time spent in setting up the TDT's internal data structures during the first event is like this:
I.e. it takes almost an order of magnitude more CPU time to do it... (The difference makes my 500 second test job go from being able to process just 50 events with the current version of master, to processing almost 500 in the same amount of time!)
mentioned in commit 79a0b101
added sweep:ignore label