Calo Challenge setup in LHCb (!1001) · Merge requests · LHCb / Gauss

Michal Mazurek requested to merge mimazure-calo-challenge-in-lhcb into master Sep 11, 2023

@gcorti @azaborow @dasalama @witoldp @landerli @admorris @witoldp @mkmiec

This MR introduces a general setup that can be used to run ML inference and produce training datasets that can be used in fast simulations compatible with the CaloChallenge setup and involving the LHCb calorimeters.

CaloChallenge is a ML competition focusing on fast calorimeter simulation using generative models. The idea is to represent each particle shower by virtual hits forming concentric cylinders with particles propagating along the z-axis. More info on the calo challenge: https://calochallenge.github.io/homepage/.

The experiment-independent approach for cylindrical and planar calorimeters was presented in Gaussino in Gaussino/Gaussino!131 (merged) and Gaussino/Gaussino!146 (merged) respectively. The work in this MR further extends this work to adapt the configuration for the calorimeters in LHCb.

Problems

The CaloChallenge setup in Gauss is way much more complex than the examples used in Gaussino. The following problems were met when implementing the configuration:

DetDesc / DD4hep: the setup must remain agnostic to the actual geometry description used in LHCb, e.g. the weights of the models should not depend on whether we used DetDesc or DD4hep,
Non-uniformity of the calorimeters: the geometry of the calorimeters is very complex. The detectors themselves are slightly tilted, and shifted with respect to the LHCb coordiante system. Moreover, some of the subregions of the calorimeters are shifted with respect to each other. There's also a beam hole in the middle of the calorimeter. See images below.
Gauss must accept virtual hits: the hits produced by the CaloChallenge inference must be converted to the native data container used in LHCb: LHCb::MCCaloHits (the work on this was done by @mkmiec in !981 (merged))
Energy corrections: the models must be trained on the energy values including all the energy corrections that are applied in the native sensitive detector class,
Timing resolution: split the energy output into 25 ns time slots (this is what is done in Gauss in detailed simulation). The timing resolution is ignored for now in the setup in this MR.
ML backends: the setup should be backend agnostic as much as possible. This became possible with the interfaces in Gaussino to pyTorch (Gaussino/Gaussino!55 (merged)) and ONNXRuntime (Gaussino/Gaussino!145 (merged)).

Solution

The configuration changes depending on whether we run in the inference or the training dataset production mode. Both setups use extensively the packages in Gaussino: CustomSimulations, ParallelGeoemtry and ExternalDetector.

Inference mode:

Kill all the particles of interest traveling through a very thin plane, a CollectorPlane, which collects all the necessary information about these particles to feed the ML models.
Fetch the G4 info using Gsino::CaloChallenge::GetCollectorHitsAlg and trigger the ML inference with either pyTorch, ONNXRuntime or any other available backend using Gsino::CaloChallenge::GetMLCaloHitsAlg at the beginning of the calorimeter sensitive area (marked in the setuo as the Trigger Plane).
Monitor the performance of the inference with the low level Gsino::CaloChallenge::DetailedAndFastSimMonitoring algorithm.
Convert, merge and optionally split the generic CaloHits used in the inference to the LHCb::MCCaloHits.
Monitor and store the LHCb::MCCaloHits as in the standard simulation.

Training dataset production:

Keep the the track of the particles passing through the CollectorPlane, but do not kill them and let them participate in the detailed simulation.
Both virtual hits and native hits are produced. The native sensitive detector class of the calorimeters used in LHCb triggers the sensitive detector class in CaloChallenge to produce the virtual hits. This allows us to propagate all the energy corrections to the virtual hits.
Use Gsino::CaloChallenge::TrainingDataCollector and Gauss::CaloCollector to produce training datasets for the virtual and native hits respectively.