Skip to content
Snippets Groups Projects
Forked from Sebastien Wertz / ttbbRun2Bamboo
19 commits behind the upstream repository.
Kyle James Read Cormier's avatar
Kyle Cormier authored
Add plotter to retrieve normalization effect of theory systematics at ttB level

See merge request swertz/ttbbRun2Bamboo!89
d845c705
History

Setup

  • Only possible on CC7
  • Set up a virtual environment and install bamboo following the instructions here: https://cp3.irmp.ucl.ac.be/~pdavid/bamboo/install.html Important notes:
    • Better to follow the instructions for "install from a local clone"
    • Use the LCG_101 environment (see path for setup script below)
    • Also install plotIt as described
    • Also install the latest version of correctionlib: pip install git+https://github.com/cms-nanoAOD/correctionlib.git@master
    • And the jetMET calculators: pip install git+https://gitlab.cern.ch/cp3-cms/CMSJMECalculators.git
    • You can ignore everything related to "SAMADhi"
  • Install CP3SlurmUtils inside the virtual environment: pip install CP3SlurmUtils
  • Install python plotIt: pip install git+https://gitlab.cern.ch/cp3-cms/pyplotit.git
  • Clone this repository
  • Make the configuration files available:
cd ttbbRun2Bamboo
mkdir ~/.config/CP3SlurmUtils; ln -s $(realpath ./config/defaults.cfg) ~/.config/CP3SlurmUtils
ln -s $(realpath ./config/bamboo.ini) ~/.config/bamboorc
  • The following needs to be run each time you want to use Bamboo, assuming you've followed the install procedure above (which includes the first two lines below, which do not need to be re-run):
source /cvmfs/sft.cern.ch/lcg/views/LCG_101/x86_64-centos7-gcc11-opt/setup.sh
source (abs path to)/bamboovenv/bin/activate
export PYTHONPATH=(path to)/ttbbRun2Bamboo/python:$PYTHONPATH
  • If you want to use Rucio, you'll need to run the following (for convenience, a setup script to be sources is available under scripts/setup_rucio.sh):
source /cvmfs/cms.cern.ch/cmsset_default.sh
source /cvmfs/cms.cern.ch/rucio/setup-py3.sh
voms-proxy-init -voms cms -rfc -valid 192:00
export RUCIO_ACCOUNT=`whoami`
  • Before running jobs on the cluster, the JEC/JER cache should be updated: see below.

Scale factors

We use the common json format for scale factors, hence the requirement to install correctionlib.

The scale factors themselves can be found in:

  • For most of them, the central POG repository, synced once a day with CVMFS: /cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration. A summary of their content can be found here
  • For the electron trigger SFs, clone this repository in the parent directory of ttbbRun2Bamboo.

The JEC and JER corrections and uncertainties are handled using CMSJMECalculators. The files containing the corrections are cached locally in ~/.cache/CMSJME/. This cache can only be read from worker jobs on the cluster, and can only be updated in interactive (non-distributed) mode. Hence, when changing the version of the corrections used, or the first time when installing this repository, it is necessary to run a plotter once using the --onlyprepare argument. This will not produce any plots, but will only check that all corrections can be loaded and configured (and update the cache if needed).

Upgrading

If you want to upgrade your bamboo installation, do (inside the bamboo install):

git pull upstream master
pip install --upgrade .

If you want to upgrade the LCG release used (e.g. to profit from a more recent ROOT version), in a clean terminal:

  • Source the corresponding LCG setup file
  • Remove (or rename) your existing virtualenv directory (typically bamboovenv)
  • Redo the installation procedure (but no need to clone bamboo again), i.e.:
python -m venv bamboovenv
source (path to)/bamboovenv/bin/activate
pip install (path to)/bamboo
pip install CP3SlurmUtils
  • Re-install plotIt:
cd (path to)/plotIt/build-plotit
rm CMakeCache.txt
cmake -DCMAKE_INSTALL_PREFIX=$VIRTUAL_ENV ..
make -j4 install

Use

Obtaining the list of input files

The list of input files is specified using a "sample template" file, as config/samples_template.yml. You may sometimes want to use other lists using only a subset of those samples for testing stuff.

In order to have local copies of the nanoAOD datasets available at the computing sites where we run bamboo, we use Rucio to request replicas. Typically, we create a Rucio container containing all the samples we need (for a given version of nanoAOD, say). Then, we create a Rucio rule that (when accepted) will trigger the replication of those samples to the given site. A script for managing the replica rules and syncing a Rucio container with the sample list config is available under scripts/manageRucio.py.

Some example usage:

$ ./manageRucio.py --list-container --container user.kcormier:/Analyses/ttbbUL20NanoV9/USER
/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1/NANOAODSIM
$ ./manageRucio.py --rule-status --rule ca2df7adc0154576a7934d102810cf4d
Id:                         ca2df7adc0154576a7934d102810cf4d
Account:                    kcormier
Scope:                      user.kcormier
Name:                       /Analyses/ttbbUL20NanoV9/USER
RSE Expression:             T3_CH_PSI
Copies:                     1
State:                      OK
Locks OK/REPLICATING/STUCK: 391/0/0
Comment:                    Local copy at T3 of UL nanoAOD samples for UZH ttbb analysis
...
$ ./manageRucio.py --sync --samples ../config/samples_template.yml --container user.kcormier:/Analyses/ttbbUL20NanoV9/USER

Note that a container can only be modified by the user it belongs to.

A script scripts/manageSampleList.py can be used to validate the list of files, or to upgrade the sample versions (e.g. going from nanoAOD v8 to v9). Some examples:

$ ./manageSampleList.py --samples sample_template.yml --check --check-version --version RunIINanoAODv9
The following samples have unexpected campaign strings:
#######################################################
/SingleMuon/Run2016B-ver2_HIPM_UL2016_MiniAODv1_NanoAODv2-v1/NANOAOD             --> should be HIPM_UL2016_MiniAODv2_NanoAODv9
/SingleElectron/Run2016B-ver2_HIPM_UL2016_MiniAODv1_NanoAODv2-v1/NANOAOD         --> should be HIPM_UL2016_MiniAODv2_NanoAODv9

The following samples could not be found in DAS:
################################################
/TTToThisIsWrong_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v2/NANOAODSIM

The following samples have a wrong version ('-vX'):
###################################################
/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v2/NANOAODSIM --> should be v1

Running:

$ ./manageSampleList.py --version RunIINanoAODv8 --change-version-to RunIINanoAODv9 > upgrade_v8_v9.sed
$ sed -f upgrade_v8_v9.sed sample_template.yml > new_template.yml

will yield a new sample template with the new sample names, however this will not always work and some manual editing will still be necessary due to inconsistencies in the sample naming. Running a check as above is therefore always required.

Producing histograms and plots

Histograms and/or plots are produced by running small modules that inherit from the NanoAODHistoModule base class defined in bamboo. For instance, to produce control plots, use controlPlotter.py as below. In there the selection and the plots are defined, using definitions (for objects, etc.) included in python/definitions.py and python/controlPlotDefinition.py.

To run on slurm, move to the python directory and run:

bambooRun -m controlPlotter.py ../config/analysis.yml -o ../test/myPlots --samples ../config/samples_template.yml --distributed driver

For a one-time test when developing (i.e., testing your code runs fine before launching many jobs), simple remove --distributed driver and use instead --test. To run with systematics, add -s (have a look at the available options using --help). To redo only the plots starting from already produced histograms, do --onlypost.

Job failures

If only a few jobs fail (e.g. they go over time or have network issues), bamboo will not run the postprocessing (merging output files and produce plots). After all the other jobs are done, it will print a slurm command to run to resubmit only the failed jobs. You can then wait until these finish (successfully), use squeue -u ${USER} or sacct to check. When they are finished, re-run the same bamboo command but replace --distributed driver by --distributed finalize. This will finalize the merging and run the postprocessing step.

List of current plotters/skimmers

To avoid duplicating code and reduce the possiblity of errors we should try to keep definitions shared between modules.

The following files define classes from which the above inherit (includes command-line options, object definitions, selections...):

  • baseTtbbPlotter.py: Base class for all plotters
  • genBaseTtbbPlotter.py: Defines gen-level stuff, derives from baseTtbbPlotter. Gen-level studies only need to inherit from this class.
  • recoBaseTtbbPlotter.py: Defines reco-level stuff, derives from recoBaseTtbbPlotter. Pure reco-level plots only need to inherit from this. Anything combining gen- and reco-level information (e.g. for unfolding) needs to derive from both classes.

The following files are used for common definitions:

For bamboo development

(For bamboo "experts") If you plan to work on the bamboo code instead of only the analysis code, installing bamboo in editable mode can be useful to avoid re-running pip install --upgrade each time there is a change: pip install -e ./path/to/bamboo.