Skip to content
Snippets Groups Projects
README.md 3.93 KiB
Newer Older
Raffaele Angelo Gerosa's avatar
Raffaele Angelo Gerosa committed
# ParticleNetStudiesRun2

## Installation recipe:

``` sh
cmsrel CMSSW_13_0_4
cd CMSSW_13_0_4/src
cmsenv
git-cms-init
Federica Riti's avatar
Federica Riti committed
git clone ssh://git@gitlab.cern.ch:7999/friti/particlenetstudiesrun2.git ParticleNetStudiesRun2 -b softditau
scram b -j 4
## Training Ntuples for ParticleNet AK4
Inside the ParticleNetStudiesRun2 repository there is a first sub-module called `TrainingNtupleMakerAK4`. This contains all the codes needed for:
* Process miniAOD MC events as input data-tier and produce an event-level ntuple containing all object collections needed for PNET. `Crab` is used to parallelize jobs.
* Skim the event based ntuples in order to produce jet-based files that will be used for the training. This skim is performed by running on either local machines in multiple threads or `CondorHT`.
* Recipe describing which version of `weaver` needs to be downloaded and installed in order to run the trainings.
* Recipe describing how to submit training jobs to the `kubernetes PRP` cluster.
* Macros in order to plot performance on the training outputs produced by weaver.
* CMSSW-based workflow to infere ParticleNet on AK4 jets once the training model is exported in `onnx` format.
## Training Ntuples for ParticleNet AK8
Inside the ParticleNetStudiesRun2 repository there is a first sub-module called `TrainingNtupleMakerAK8`. This contains all the codes needed for:
* Process miniAOD MC events as input data-tier and produce an event-level ntuple containing all object collections needed for PNET. `Crab` is used to parallelize jobs.
* Describe how to generate MC events needed for the training following the UL production chain in `MCM`. Production submitted in `Crab`, output files published in `DBS` and stored at `T2_US_UCSD`. 
* Skim the event based ntuples in order to produce jet-based files that will be used for the training. This skim is performed by running on either local machines in multiple threads or `CondorHT`.
* Recipe describing which version of `weaver` needs to be downloaded and installed in order to run the trainings.
* Recipe describing how to submit training jobs to the `kubernetes PRP` cluster.
* Macros in order to plot performance on the training outputs produced by weaver.
* CMSSW-based workflow to infere ParticleNet on AK8 jets once the training model is exported in `onnx` format.
## Data vs MC comparisons for Run2 2018 UL
Inside the ParticleNetStudiesRun2 repository there is a first sub-module called `AnalysisNtupleMaker`. This contains all the codes needed to:
* Process data and MC UL 2018 events from `miniAOD` data tier producing a flat output tree.
* The output trees contain already all the corrections for each object that needs to be applied to make a proper data-MC comparisons.
* While producing the ntuples, four basic skims are available to pre-select events in the interesting control regions:
  * $`\mathrm{t\overline{t}} \to \mathrm{e}\mu`$ : pre-select events asking for one tight muon and one tight electron with opposite charge. Natural control region for b-jets.
  * $`\mathrm{Z} \to \mu\mu`$ : pre-select events asking for one tight and one medium muon (with loose iso) + opposite charge + invariant mass compatible with Z. Natural control region for light jets. However, the Z pT balance in Z+bjets can be used for verifying the b-jet energy regression response.
  * $`\mathrm{Z} \to  \mathrm{ee}`$: pre-select events asking for two tight electrons + opposite charge + invariant mass compatible with Z. Natural control region for light jets. However, the Z pT balance in Z+bjets can be used for verifying the b-jet energy regression response.
  * $`\mathrm{Z} \to  \tau_{h}\mu`$: pre-select events with one tight muon one hadronic-tau passing basic HPS+DeepTau requirements (VVLoose WP). Additional conditions to increase the purity of the sample will be applied in later stages of the analysis following what is done in TAU-20-001.
* Codes located in the `macros` directories are instead used to make specific data-MC comparisons on the UL 18 datasets.