Cleanup btagging configuration (!51419) · Merge requests · atlas / athena

Dan Guest requested to merge dguest/athena:update-reco-btag into master Mar 20, 2022

This consolidates the configuration functions we use in several places:

trigger,
reconstruction,
derivations, and
retagging (which doesn't happen in the Athena repo).

Overview

Pictures are cool:

graph TD;
subgraph BTagAlgs [Configured by One Function]

assoc[JetParticleAssociation] --> jettag & finder

assocm[JetParticleAssociation muons] --> jettag

finder[JetSecVtxFinding] --> vx[JetSecVertexing]
vx --> jettag

jettag[JetBTagging] ==> muaug[BTagMuonAugmenter] & jetaug[BTagJetAugmenter]
muaug & jetaug ==> dl[Machine Learning Algorithms]
end

db[Calibration Database] --> jettag
tr(tracks) -.-> aug & vx & jetaug & dl
mu(Muons) -.-> assocm & muaug
aug[BTagTrackAugmenter] --> assoc
jet(jets) -.-> assoc & assocm & finder & vx & jettag
pv(Primary Vertex) -.-> finder & vx & jettag
json(List of NN Files) -.-> dl
dl ==> btag(BTagging Object)

This is a rough sketch of how information flows through the b-tagging code. Little boxes are algorithms, and the dotted lines indicate where some information needs to be passed into them. Solid lines indicate objects that are created internally. In the solid line case, the name of the object has to be synchronized between algorithms. The BTagging object follows the thick black line.

The key point here is that this logic is implemented in 4 different places.

The problem

Modular stuff is good. But in this case the 4 implementations are all supposed to do the same thing. The underlying code is also a bit crufty and some of the boxes should probably be split or merged, which is really hard to do in a few places coherently.

So for now it's probably better to merge everything. The big gray box holds the algorithms we configure with one function. We want to put everything in the gray box! Once we've done that we can move stuff around to make it easier to pop new boxes in and take old boxes out.

Implementation

The idea was to move everything into the box, but that turned out to be a bit more difficult. Instead most of the tagging calls three functions:

a calibration database setup,
track augmentation, and
the top level b-tagging one (the gray box).

I wasn't able to merge the first two into the last because:

The conditions database setup function for derivations can't be replaced by JetTagCalibCfg for some reason. It seems to break the muon conditions alg when I try.
Track augmentation has to be separated from the rest of tagging, because the retagging code uses a view container which is the union of two other containers. Both of these have to be augmented before the containers are merged.

In the process of implementing this I made some other improvements:

ATLASRECTS-6635: Some small progress on cleaning up CompFactory calls on imports
ATLASRECTS-6172: Add soft muon scalars to the BTagging object
Move configuration functions that call FlavorTagDiscriminants into FlavorTagDiscriminants
Add or clean up some ConfigFlags:
- Merged run2TaggersList and Run2TrigTaggers into taggerList. They are the same taggers, but if they ever diverge the ConfigFlags aren't shared between reconstruction, trigger, and derivations anyway.
- Made the taggerList depend on whether we've enabled RunFlipTaggers
- Moved calibrationChannelAliases to ConfigFlags, cleaned it up considerably
- Added a forcedCalibrationChannel option, which tells every tagger to use a specific calibration channel
Updated the "retagging" store gate renaming functions to be the ones we actually use for retagging

I also deleted and simplified a lot of unused code.

Validation

This causes no changes in any physics outputs (I checked trigger and derivations). I've done some tests on DAOD_PHYS and DAOD_FTAG1. The only changes to FTAG1 (over a few hundred events) were the addition of softMuon variables.

Built on nightly 2022-03-21T2101

Implications for developers

A few things have moved around, so I'll give a short guide on where to find them now. Everything that runs the main tagging chain will now have a call like

BTagAlgsCfg(cfgFlags, JetCollection, nnList)

Which is defined in BTagging/BTagRun3Config.py. The nnList is a list of all dips and dl1 taggers for that specific collection. There are also optional arguments for the trackCollection, muons, and primaryVertices (by default they are the standard offline ones).

A lot more options have also been moved to BTagging/BTaggingConfigFlags.py. These include calibrationChannelAliases and the taggerList, the later of which has a default that depends on whether the flip taggers are enabled.

To Do

I left out a few things that should be discussed with the flavor tagging group, or that might depend on external developments:

Figure out why I can't use the same calibration database setup function in derivations.
Do track collection merging as part of this function.
Consider using forcedCalibrationChannel in more places. We might have PFlow specific trainings for taggers that use the calibration database, and I'm pretty sure we don't have anything specific for variable radius track jets. If the trainings are all identical we could replace the channel aliases with an empty list and map everything to one jet collection.
Enable muon information in reconstruction jobs. Right now the data dependencies for BTagMuonAugmenter aren't correct, which leads to random crashes in Athena MT. Derivations are single thread for now, so the muons still run there.
Disable MV2c10 in the trigger code. See ATR-25239.

Edited Mar 25, 2022 by Dan Guest

Cleanup btagging configuration