AthenaMonitoringKernel: Performance improvement for 1D histograms
A first set of fixes to improve the performance of the monitoring framework. There is currently significant overhead (factor 10 or more depending on the use-case). See ATR-21202 for an example as reported by @rbielski.
This MR is mainly dealing with reducing the overhead on 1D histograms for the case when no cut mask and/or weight is being used. Use lambda's wherever possible instead of function pointers to take advantage of compiler optimizations. However, much more fundamental changes will be required to get the number of memory allocations reduced.
In addition:
- Add
GenericMonPerf_test.exe
test executable that can be used to compare basic filling operations with ROOT (see--help
for options). It can also be used to profile the code with valgrind/callgrind. See the header of the source file for details.
Merge request reports
Activity
added Core DQ master review-pending-level-1 labels
Changes look ok.
Are you already positive that this is due to memory allocations?
With Piotr we had idea at some point to avoid the conversion to vector and replace it by iterator pairs (no memory). But, it seemed to require custom iterators in some kind of hierarchy and virtual call in each dereference. Back then we agreed that i will likely be slower.
It's definitely very significant (~30-40% for a 1D fill). Yes, I was looking myself at an iterator-based solution. I started prototyping something but as you say, it gets very complicated quickly. Still we may have to give it a try. We probably also need to revisit your !28409 (merged) and in general there is quite some memory allocation happening when creating the fillers. Maybe we need a pool of fillers that we can re-use.
CI Result SUCCESS (hash 2123ec60)Athena AthSimulation AnalysisBase externals cmake make required tests optional tests Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 11764]- Resolved by Peter Onyisi
Hi Frank,
Thanks for taking a look at this. I spent a fair amount of time on optimization using VTune a while ago in the context of offline monitoring (where in the end I stopped because the runtime was completely dominated by, e.g., T/P conversion, not by histogram fills ...) but for your case it may matter. (Also agreed that it would be better to directly use the backing data instead of making additional vectors ... though it wasn't obvious from my tests that this was super-important, it seemed that we were more limited by branch prediction thrashing). Do you have profiling results that you can share?
Best, Peter
Indeed, the hotspots might be quite different in trigger and offline DQ. I attached a callgrind profile of the micro benchmark in this MR to ATR-21210 (we can continue the discussion there). This micro-benchmark is certainly not representative for a DQ application. But it's coming close to the trivial monitoring that Rafal was trying to do in the description of this MR and that started this work.
- Resolved by Frank Winklmeier
- Resolved by Frank Winklmeier
added review-user-action-required label and removed review-pending-level-1 label
added review-approved review-pending-level-1 labels
removed review-approved review-user-action-required labels
added review-pending-level-2 label and removed review-pending-level-1 label
added review-approved label and removed review-pending-level-2 label
mentioned in commit 1fcded18
added sweep:ignore label