[ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change

This merge request affects 1 package:

Trigger/TrigAnalysis/TrigDecisionTool

This merge request affects 1 file:

Trigger/TrigAnalysis/TrigDecisionTool/Root/CacheGlobalMemory.cxx

Adding @tamartin as watcher

added Trigger analysis-review-required master review-pending-level-1 labels

CI Result SUCCESS (hash a96883ff)

	Athena	AthSimulation	AthGeneration	AnalysisBase	AthAnalysis	DetCommon
externals
cmake
make
required tests
optional tests

Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 33769]

added review-approved label and removed review-pending-level-1 label

I tested the change this morning. It does improve the situation quite a bit!

The Boost regular expression code is still pretty dominant at the beginning of my 64-thread test job, but not as much as it was without this update. So I'm fully on board.

OK - so beginEvent on the 1st event goes from around 4 mins of 64 cores at 100% to around 70s with mixed periods of work and idle. Certainly more sane.

Though the TDT is still the largest user it seems here (though in a longer job it becomes less significant).

@bwynne was having a look at improving the performance of the existing fn. a bit more with some further optimizations.

More serious optimisations would make sense to me as part of a larger TDT backend rewrite, something which I have been trying to avoid. But may need to happen at some point.

I do think that there's something about this code that we don't understand though.

With fewer threads the same per-thread initialisation takes less time. (I'll make some tests for this actually. I've not tested <64 thread jobs in quite a while.) This is why I assume(d) that there's memory contention between the threads all doing the same memory intensive operation at the same time.

But as I wrote in ATR-23427 yesterday, when I forced these operations to execute one-by-one, they still took very long times, even individually. Which practically lead to the job not even exiting this initialisation phase during the time that I've set for the profile collection.

So I suspect that with many threads we somehow increase the amount of operations as well, it's not just a matter of threads stepping on each others' toes. But I don't know the TDT code well enough to be able to give a guess to how that could be...

resolved all threads

mentioned in merge request !43664 (closed)

Let me ping @emoyse on it, just to get things moving.

As a reminder, without this update the amount of time spent in setting up the TDT's internal data structures during the first event is like this:

I.e. it takes almost an order of magnitude more CPU time to do it... (The difference makes my 500 second test job go from being able to process just 50 events with the current version of master, to processing almost 500 in the same amount of time!)

merged

mentioned in commit 79a0b101

added sweep:ignore label

[ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change

Merged by Edward Moyse 4 years ago (May 21, 2021 9:31am UTC) 4 years ago

Activity

[ATR-23427] Do not process STREAM and GROUP ChainGroups twice on config change

Merge request reports

Merged by Edward Moyse 4 years ago (May 21, 2021 9:31am UTC) 4 years ago

Activity