AthenaMonitoringKernel: Performance improvement for 1D histograms

This merge request affects 1 package:

Control/AthenaMonitoringKernel

Adding @ssnyder as watcher

added Core DQ master review-pending-level-1 labels

Changes look ok.

Are you already positive that this is due to memory allocations?

With Piotr we had idea at some point to avoid the conversion to vector and replace it by iterator pairs (no memory). But, it seemed to require custom iterators in some kind of hierarchy and virtual call in each dereference. Back then we agreed that i will likely be slower.

It's definitely very significant (~30-40% for a 1D fill). Yes, I was looking myself at an iterator-based solution. I started prototyping something but as you say, it gets very complicated quickly. Still we may have to give it a try. We probably also need to revisit your !28409 (merged) and in general there is quite some memory allocation happening when creating the fillers. Maybe we need a pool of fillers that we can re-use.

CI Result SUCCESS (hash 2123ec60)

	Athena	AthSimulation	AnalysisBase
externals
cmake
make
required tests
optional tests

Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 11764]

resolved all threads

Hi Frank,

Thanks for taking a look at this. I spent a fair amount of time on optimization using VTune a while ago in the context of offline monitoring (where in the end I stopped because the runtime was completely dominated by, e.g., T/P conversion, not by histogram fills ...) but for your case it may matter. (Also agreed that it would be better to directly use the backing data instead of making additional vectors ... though it wasn't obvious from my tests that this was super-important, it seemed that we were more limited by branch prediction thrashing). Do you have profiling results that you can share?

Best, Peter

Indeed, the hotspots might be quite different in trigger and offline DQ. I attached a callgrind profile of the micro benchmark in this MR to ATR-21210 (we can continue the discussion there). This micro-benchmark is certainly not representative for a DQ application. But it's coming close to the trivial monitoring that Rafal was trying to do in the description of this MR and that started this work.