Fix modelling of tracking inefficiencies related to multple scattering (!860) · Merge requests · LHCb / Gauss

Lucio Anderlini requested to merge landerli_lamarr_trkeff_fix into master Jun 10, 2022

See also

Parallel merge request on LamarrData (lhcb-datapkg/LamarrData!5 (merged))

Summary checklist before merge

Validate suggested solution
Document the validated solution
lhcb-datapkg/LamarrData!5 (merged) must be merged and tagged as v3r0
CMakeLists.txt must point to the LamarrData v3r* --> !875 (merged) GaussDependencies must have LamarrData 3.0

Discussion

At the SimDev meeting on May 11 we reported a first comparison of the simulation obtained with Gauss and Lamarr on the same DecayFile using either Pythia or ParticleGun as a generator. One of the findings of the comparison was that a dip in the track distribution at \eta = 4.25 was visible in both real data and Geant-based simulation, but not in Lamarr.

The effect of the missing dip can be ascribed to a mismodelling in either the geometrical acceptance or the tracking efficiency. Studying the Brunel-produced dataset used to train the models we observe no trace of a dip around \eta = 4.2 in the acceptance:

While a clear horizontal stripe is visible in the p\perp\eta plane describing the reconstruction efficiency as long tracks

we conclude the missing dip is related to a mismodelling of the reconstruction efficiency as long tracks.

This is somewhat surprising as the during the validation of the model, performed on a sample statistically equivalent and indepedent of the training dataset, we observe the hirizontal stripe perfectly reproduced

Studying the dip in more detail, we observe that corresponding to the dip in the reconstruction efficiency as long tracks we observe a peak in the reconstruction efficiency as Velo tracks (track pseudorapidity on the x axis).

This is easily interpreted as the fact that because of multiple scattering, the tracks is correctly reconstructed in the Velo, but cannot be upgraded to "upstream track" because between the end of the Velo and the TT it gets deflected.

This is confirmed by the observation that the track states at "EndVelo" and at "AtT" are distributed along the bending direction in a very different way, despite the similar momentum distribution.

The effect also appears localized in terms of \phi angle, with a dip at \phi \sim k\pi \forall k.

This might be related to RF foils (see for example this presentation) or other structures immediately after the Velo.

Independently of the exact structure resuling into an efficiency drop, the mechanism seems related to multiple scattering in some detector material which results into important kinks in the track trajectory between the Velo segment and the downstream segment.

Now, the model currently used in Lamarr for acceptance and efficiency parametrization uses as input parameters the position and the momentum of the track states at "EndVelo" and "AtT". The idea behind this choice is that physically it is the position where the track crosses the detectors that should determine at least the acceptance and, to maintain a symmetric description, also the efficiency. The assumption works very well when applied on top of samples processed with Geant which encodes in the position of the hits across the detector the information on the multiple scattering that the particle was subject to. Unfortunately, when using a parametric propagation as in Lamarr, the acceptance and efficiency models behave very poorly because all the tracks have perfectly aligned states and, to the model, this means they are all perfectly reconstructible, and declares them as reconstructed.

This results into an overestimation of the tracking efficiency in Lamarr, due to the fact that we are neglecting efficiency drops due to multiple scattering.

We considered two options:

rewrite the track propagator to take into account multiple scattering;
model the acceptance and the efficiency as a function of the ClosestToBeam state, which is the most relevant in terms of physics analysis.

The former option would require a significant amount of work to renounce to the extremly simple and effective idea of propagating particles with a pT-kick approach, with an algorithm tuned analytically with a fit to the "bending z" position. The benefit of that amount of work is not clear. It is proably negligible for physics analysis, while it may be relevant for detector studies interested in a more realistic trajectory of the charged particles. Possibly a future upgrade.

We are opening this MR to keep track of the implementation and validation of the second option.

Update 2022-07-18

We implemented the acceptance and efficiency models as function of the ClosestToBeam state to avoid missing inefficiency effects due to multiple scattering.

At the end of the velo, before the effect of scattering in detector material becomes relevant, we observe good agreement between the simulated efficiency and the model.

Clearly, getting further from the interaction point, for example at the End of the T stations the model becomes poorer: In particular, the model predicts that particles passing completely outside of the detector area are actually reconstructed. However, the reason they pass there is multiple scattering in the VELO, which in Lamarr is not modeled (at least not so far).

The model for the efficiency as a function of p and \eta is also accurate and has been simplified to make the inference faster.

The models and their integration in Lamarr have been validated using particle gun and compared to Geant simulation (represented in light blue in the figure) observing good reproduction of the inefficiencies due to multiple scattering

Update 2022-07-21

Measuring the CPU performance of the various parametrizations for the presentation at ICHEP we noticed that, using Pythia, the time spent computing acceptance and efficiency explodes. This is due to the much larger number of particles for which the BDT must be evaluated. To reach this conclusion we developed a simple profiler to keep track of the time spent to evaluate the various parametrizations. This simple profiler is included in this Merge Request as it turned out to be particularly relevant to tune the acceptance and efficiency models.

`SimpleProfiler`

We developed a simple class named SimpleProfiler that provides the following methods:

start (<code block tiitle>) to initialize a timing measurement
stop () to terminate the most recently started measurement
json_dump() returning a string in json format with all the timing measurements

Together with the SimpleProfiler class, we provide the script scripts/read_profile.py to read the json files and print statistics to standard output.

Example

Consider the implementation of a generic Lamarr parametrization defined in a LamarrSomeFeature class. Then one will have a header file LamarrSomeFeature.h including:

#include "SimpleProfiler.h"
...
class LamarrSomeFeature: public GaudiAlgorithm
{
  public:
    virtual StatusCode finalize () override;
  ...
  private:
    SimpleProfiler m_profiler;
}

Then in the implementation file, LamarrSomeFeature.cpp,

void LamarrSomeFeature::compute_my_feature ()
{
  m_profile.start("My Feature");
  for (int i = 0; i < 100; ++i)
  {
    m_profile.start("One step");
    // do something
    m_profile.stop();
  }
  m_profile.stop();
}

StatusCode LamarrSomeFeature::finalize()
{
  std::ofstream profile_file;
  profile_file.open("LamarrSomeFeature.profile.json");
  profile_file << m_profiler.json_dump();
  profile_file.close();
}

Executing the algorithm LamarrSomeFeature on a single event will result into a single measurement of the time spent in "My Feature" and 100 measurements of the time spent in "One step". All the measurements are made available in the json file LamarrSomeFeature.profile.json to ease future correlation analyses.

To print some statistics, one can issue the command

python3 Sim/LbLamarr/scripts/read_profile.py *.profile.json

and read the table which will look like:

LamarrSomeFeature ===========================
  My Feature      42.0 s  1.2    1
    One step       0.4 s  1.0  100

where the three numbers in a row represent:

the total amount of time spent in the code block, in seconds
the ratio between the maximal time spent and the average (as a measurement of the skewness of the distribution)
the number of timing measurements

Integration within existing parametrizations

Timing measurements are defined for the algorithms:

LamarrPropagator
LamarrParticleId
LamarrRecoSummary

The profiler is disabled by default, but can be enabled through the Lamarr option system:

from Configurables import LbLamarr
LbLamarr().ProfileParametrizations = True

Edited Aug 03, 2022 by Lucio Anderlini

Fix modelling of tracking inefficiencies related to multple scattering