CaloRecGPU Offline and Online Configuration and Clean-Up
Following from the great efforts in updating the CaloRecGPU configuration to a flags-based style (namely !68106 (merged) and !68546 (merged)), we are now able to provide a proper configuration for both offline and online, with extraneous options cleaned up and new ones added as needed, providing what I hope would be a sensible interface that encourages code reuse. In particular, we try to leverage the ActiveConfig
convention to unify the offline and online instantiations, which I think is a pretty powerful mechanism that may improve the clarity of the configuration here and elsewhere.
My personal guidelines while creating this configuration:
-
Flags enable configuring every possible set of behaviours we might want to change (or even just test) in the tools/algorithms to be instantiated.
-
...Cfg
functions can receive**kwargs
to potentially override other options directly (using a loop withsetattr
, NOT by simply forwarding them to the component factory!), to allow one-off instantiations with particular changes in very specific cases if they are needed. (This is essentially equivalent to setting them directly after thepopToolsAndMerge
, but may be more ergonomic.) -
Other arguments may be added to the
...Cfg
functions on a case-by-case basis to configure specific things that are more related to how we are instantiating everything (e. g. the name for the component, whether or not we want trigger-style monitoring, etc.) than to how we want the algorithm to behave. -
The
ActiveConfig
infrastructure is leveraged to accommodate different needs without repeating the instantiating code (as any different behaviours should be specifiable by changing the appropriate flags and, in particular cases, the arguments to the...Cfg
) -
Flag hierarchy should be consistent. (See Control/AthenaConfiguration/python/AllConfigFlags.py)
Full consensus and enforcement is indispensable, but my personal preference would be:
-
flags.<System>.<Stuff>
for most things (with potential subcategories when different options for different purposes are required, e. g. the different instantiations we now see of CPU CaloClusterMaker) -
flags.Trig.<System>.<Stuff>
in the rare cases where there is a specific trigger setting that relates to said<System>.<Stuff>
, or a trigger-specific algorithm, or whatever, that has no offline equivalent - Trigger-specific configuration of the tools/algorithms should be leveraged through separate sets of flags, e. g.
flags.<System>.<Stuff>.Trigger
(or equivalent with multiple categories)
-
Point 5 is the one I admit may cause more contention here, and, as with everything that essentially has strictly to do with organization and semantics, having a full consensus is more important than anything else.
In an effort to make the instantiation of the CaloRecGPU tools and algorithms work in perfect agreement with what CaloRec does, a few changes that lie outside the narrow definition of this configuration are needed:
-
Prevent
CaloClusterSnapshot
fromdynamic_cast
ing its parent to aCaloClusterMaker
to find out the final cluster container name.This basically meant we could not instantiate a
CaloClusterSnapshot
as a tool of ourCaloGPUHybridClusterProcessor
, to match what the CPU equivalent does in CaloRec. A possible solution would be making bothCaloClusterMaker
andCaloGPUHybridClusterProcessor
inherit from a common (virtual) base that would provide the necessary interface, but, given that there are few places where aCaloClusterSnapshot
is instantiated, it is much cleaner to add an option to the class and assign the correct value to it at the instantiation site.This leads to (localized) changes in
CaloClusterTopoGetter.py
andCaloTopoClusterConfig.py
. -
Add an additional option to the CPU topological cluster growing tool to decide whether the clusters will be cut in transverse energy or in absolute transverse energy
This is motivated in the first place by the fact that, on the GPU, the cluster cuts are applied by a different tool than the one that does cluster growing, so it has always been a distinct option there, especially since floating point accuracy issues meant that studying this cut was necessary at some point, and we are interested in keeping it as such for potential future tests.
Secondly, the CPU (and by extension the GPU) cluster growing already has three distinct options to decide whether the seed, growing and terminal cells are classified using the signal-to-noise ratio or its absolute value; these options are independent and, even though it might make little physical sense, having distinct settings is a theoretical possibility of the current configuration (and something that might be interesting to consider e. g. for debug purposes). However, rather in contrast with this increased freedom of configuration, the option that decided whether the seed cells were classified in absolute value was recycled to also decide whether or not to apply the cluster cut in absolute value, which to me seems rather inconsistent.
Thirdly, and following from the last point, the cluster cut in transverse energy takes into account all the cells that are part of the cluster. Conceptually speaking, it is an entirely different operation from the seed cell classification, which only takes into account the energy and noise of a single cell (and it is theoretically possible that there are some clusters whose overall transverse energy is of a different sign than the energy measurement at its seed cell). Of course, for most purposes, we are interested in considering absolute value for both the cell classification and the transverse energy cut, but in that case the three different options for each of the kinds of cells should also be unified into a single "use absolute value" option. This being a more fundamental change to the algorithm, I think it is outside of the scope of this MR, but it may be worth discussing in the future.
Tagging @menke per Denis' suggestion.
However, the addition of this option does result in quite a few changes outside of the immediate scope of this MR, in the places where the option that controls the seed cell classification in absolute value is set, as the new option that was introduced to control the cluster cuts in transverse energy must agree with it. These correspond to:
Calorimeter/CaloRec/python/CaloClusterTopoEMGetters.py
DataQuality/DataQualityTools/share/HadTopoClusterMaker.py
Reconstruction/HeavyIonRec/HIJetRec/python/SubtractedCaloClusterTopoGetter.py
Reconstruction/tauRec/python/TauToolHolder.py
TileCalorimeter/TileMonitoring/python/TileTopoClusterConfig.py
TileCalorimeter/TileMonitoring/share/TileMonTopoCluster_jobOptions.py
I do realize some of those may be on the way to deprecation, but I am unsure whether or not to keep them up to date with our changes to prevent undesirable behaviour regressions. (And they also provide a relevant argument for having the
ActiveConfig
manage all the different possible configurations instead of having tools instantiated in just slightly different ways across multiple files...) -
Add a CPU flag that would allow instantiating the fully CPU configuration with GPU criteria
This would basically make it more convenient to do a comparison of the full configuration and instantiation between CPU and GPU in the future without any changes to the CPU code (barring the setting and use of this flag), instead of the approach used so far, that was to instantiate only the CPU tools with the appropriate options in the GPU-related code.
Some cleanup of older code was also undertaken:
- Fixed some errors from compilation paths that are usually not taken (related to CUDA and C++ versions).
- Removed deprecated prototype of GPU (non-automaton) splitting