Common classifiers for ttH analysis
This branch is for analyses using CMSSW_10_2_X
Setup
cd $CMSSW_BASE/src
cmsenv
git clone ssh://git@gitlab.cern.ch:7999/ttH/CommonClassifier.git TTH/CommonClassifier -b 10_2_X
git clone https://gitlab.cern.ch/Zurich_ttH/MEIntegratorStandalone.git -b 10_2_X TTH/MEIntegratorStandalone
git clone https://gitlab.cern.ch/kit-cn-cms-public/RecoLikelihoodReconstruction.git TTH/RecoLikelihoodReconstruction
mkdir -p $CMSSW_BASE/lib/$SCRAM_ARCH/
cp -R TTH/MEIntegratorStandalone/libs/* $CMSSW_BASE/lib/$SCRAM_ARCH/
scram setup lhapdf
scram setup TTH/MEIntegratorStandalone/deps/gsl.xml
scram b
mem_test
mem_test_dl
bdt_test
dnn_test
Usage
- Objects have to fulfill the cuts described here: https://twiki.cern.ch/twiki/bin/view/CMS/TTbarHbbRun2ReferenceAnalysis
- Loose jets are defined by the same cuts as the standard ak4-jets except for the p_T-cut which is p_T > 20 GeV instead of p_T > 30 GeV. The loose jet collection is inclusive and also contains the standard jets.
- The BDTs are trained and optimized on odd-numbered Events, to avoid bias they should only be evaluated on even-numbered events (edm::Event.id().event()%2==0)
Running the CommonClassifier industrially
As discussed, we will make a common infrastructure to run the MEM on group-specific ntuples in a common way on either a local cluster or the CMS grid (CRAB3). The following diagram explains the principle:
ntuplization chain:
group ntuple -> CommonClassifier input ntuple -> CommonClassifier on the cluster/grid -> CommonClassifier output ntuple
analysis of ntuples:
|--------------------------------|
| group-specific ntuple |
| via (run, lumi, event) | -> histograms with BDT, MEM, ...
| CommonClassifier output ntuple |
|--------------------------------|
Technically, the access of the CommonClassifier output ntuple from the histogramming code can be done using the (run, lumi, event)
lookup database at https://github.com/kit-cn-cms/MEMDataBase
CommonClassifier input ntuple
In order to run the CommonClassifier using the gridding infractructure, you must export your private ntuples to a TTree with exactly this structure:
long run
long lumi
long event
int systematic //keep track of systematically variated events, according to enum MEMClassifier::Systematics
int hypothesis //MEM hypothesis type (e.g. 0 for SL, 1 for DL etc), set to -1 now if you want the MEM to be calculated, -2 if you don't want the MEM to be calculated (e.g. 2-tag event)
int numcalls //MEM integration points, set to -1 now
int nleps //varying number of leptons
//the following arrays should be of varying length, with buffer sizes of 2
float lep_pt[nleps]
float lep_eta[nleps]
float lep_phi[nleps]
float lep_mass[nleps]
float lep_charge[nleps]
int njets //varying number of jets
//the following arrays should be of varying length, with buffer sizes of 10
float jet_pt[njets]
float jet_eta[njets]
float jet_phi[njets]
float jet_mass[njets]
float jet_csv[njets]
float jet_cmva[njets] // fill with -1 in case you don't have it
int jet_type[njets] //if jet is resolved, boosted, according to MEMClassifier::JetType enum
float jet_corr[njets] // nominal JES correction
float jet_corr_JER[njets] // nominal JER correction
float jet_x[njets] // For all the factorized JES sources that are considered
int nloose_jets //varying number of loose jets, i.e. jets with 20<pt<30 NOT in jets collection, set to 0 if you don't use loose jets (SL BDT will be incorrect)
//the following arrays should be of varying length, with buffer sizes of 5
float loose_jet_pt[njets]
float loose_jet_eta[njets]
float loose_jet_phi[njets]
float loose_jet_mass[njets]
float loose_jet_csv[njets]
float loose_jet_cmva[njets] // fill with -1 in case you don't have it
float met_pt
float met_phi
An example tree can be found here: https://gitlab.cern.ch/ttH/CommonClassifier/blob/master/test/exampleTree.root, it's suggested to use this class in order to reduce errors from re-implementing this TTree.
CommonClassifier output ntuple
The structure should be the following
long run
long lumi
long event
int systematic
int hypothesis
double bdt
double mem_p
double mem_p_sig
double mem_p_bkg
double mem_x_p // MEM discriminant for the factorized JES sources