Draft: Refactor JVT selection interface
Currently there is a duplication of functionality between the NNJVT tool, which both computes the NNJVT discriminant and applies cuts on it, and and the JVT selection tools, which apply cuts to select jets. This is suboptimal because it is often desirable to do this things in two different places (e.g. computing the variable when making a derivation, then applying the desired cuts at the analysis stage). This MR:
- Removes the application of any cuts and validity ranges from the tool which computes NNJVT, along with associated config variables
- Updates the JVT selection tools to also implement the
IJetDecorator
interface, making it easier to compute selection decision flags on the jets - Adjusts the jet trigger configuration to use this new structure (previously it has been relying on the pass decision made at variable computation time).
Merge request reports
Activity
@khoo, since you made some relevant code changes recently: Is the Athena CI sufficient to check that this preserves the intended trigger behavior, or is any additional validation needed (if so, how can this be tested)?
16 16 UseTrkAugNN = True, 17 17 NNConfigDir = "JetPileupTag/NNJvt/HLT-2025-02-05", 18 18 NNParamFile = "TrkAugNNJVT.Network.graph.HLT.json", 19 NNCutFile = "TrkAugNNJVT.Cuts.HLT.json", I guess this will conflict with !78757 (merged).
CI Result FAILURE (hash 584b1c84)Athena AnalysisBase AthAnalysis externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 1, warnings 1
AnalysisBase: number of compilation errors 1, warnings 2
AthAnalysis: number of compilation errors 0, warnings 1
For experts only: Jenkins output (remote access info)added review-user-action-required label and removed review-pending-level-1 label
added analysis-review-approved label and removed analysis-review-required label
Investigating the CI issues, I've realized something that seems a bit strange: Instead of taking as input the jet eta and phi, the NN appears to instead take as input features bin indices for eta and phi, which are dependent on the cut configuration. This would appear to defeat the entire purpose of this MR, because it forces the tool evaluating NNJVT to be aware of the cuts.
Is there some reason why it was done this way in the past? We can change it for the consolidated recommendations, but to retain backward compatibility we'll have to at least keep the option to pass in a cuts config file to the NNJVT tool.
Tagging @gotero, @cmorenom, @fballi, @jdandoy for any comments they might have.
Yes, the choice to discretise eta/pt for the first iteration was due to limited stats, i.e. that we didn't want to induce overtraining or spurious correlations due to sampling a handful of events in particular eta/pt bins.
We have solved the pipeline issues that limited using a larger training sample, but obviously this is baked into the original version. That said, you could very well move/copy the cut information into the model json, and read it from there just for the older offline version, and leave a more elegant solution for the next iteration.
OK, good to know. I think it's probably best for now to avoid messing with the model json (we'd really need to release a new version with this structure, which seems more trouble than it's worth) so maybe we're better off pausing this MR until the consolidated recommendations are ready to go out, which can use pt/eta directly instead of needing this binning config. I'll mark it as draft for now and then come back in a few months when that's ready.
removed review-user-action-required label