Setup
- Only possible on CC7
- Set up a virtual environment and install bamboo following the instructions here: https://cp3.irmp.ucl.ac.be/~pdavid/bamboo/install.html
Important notes:
- Better to follow the instructions for "install from a local clone"
- Use the
LCG_101
environment (see path for setup script below) - Also install plotIt as described
- Also install the latest version of correctionlib:
pip install git+https://github.com/cms-nanoAOD/correctionlib.git@master
- And the jetMET calculators:
pip install git+https://gitlab.cern.ch/cp3-cms/CMSJMECalculators.git
- You can ignore everything related to "SAMADhi"
- Install CP3SlurmUtils inside the virtual environment:
pip install CP3SlurmUtils
- Install python plotIt:
pip install git+https://gitlab.cern.ch/cp3-cms/pyplotit.git
- Clone this repository
- Make the configuration files available:
cd ttbbRun2Bamboo
mkdir ~/.config/CP3SlurmUtils; ln -s $(realpath ./config/defaults.cfg) ~/.config/CP3SlurmUtils
ln -s $(realpath ./config/bamboo.ini) ~/.config/bamboorc
- The following needs to be run each time you want to use Bamboo, assuming you've followed the install procedure above (which includes the first two lines below, which do not need to be re-run):
source /cvmfs/sft.cern.ch/lcg/views/LCG_101/x86_64-centos7-gcc11-opt/setup.sh
source (abs path to)/bamboovenv/bin/activate
export PYTHONPATH=(path to)/ttbbRun2Bamboo/python:$PYTHONPATH
- If you want to use Rucio, you'll need to run the following (for convenience, a setup script to be sources is available under
scripts/setup_rucio.sh
):
source /cvmfs/cms.cern.ch/cmsset_default.sh
source /cvmfs/cms.cern.ch/rucio/setup-py3.sh
voms-proxy-init -voms cms -rfc -valid 192:00
export RUCIO_ACCOUNT=`whoami`
- Before running jobs on the cluster, the JEC/JER cache should be updated: see below.
Scale factors
We use the common json format for scale factors, hence the requirement to install correctionlib.
The scale factors themselves can be found in:
- For most of them, the central POG repository, synced once a day with CVMFS:
/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration
. A summary of their content can be found here - For the electron trigger SFs, clone this repository in the parent directory of
ttbbRun2Bamboo
.
The JEC and JER corrections and uncertainties are handled using CMSJMECalculators.
The files containing the corrections are cached locally in ~/.cache/CMSJME/
.
This cache can only be read from worker jobs on the cluster, and can only be updated in interactive (non-distributed) mode.
Hence, when changing the version of the corrections used, or the first time when installing this repository, it is necessary to run a plotter once using the --onlyprepare
argument.
This will not produce any plots, but will only check that all corrections can be loaded and configured (and update the cache if needed).
Upgrading
If you want to upgrade your bamboo installation, do (inside the bamboo install):
git pull upstream master
pip install --upgrade .
If you want to upgrade the LCG release used (e.g. to profit from a more recent ROOT version), in a clean terminal:
- Source the corresponding LCG setup file
- Remove (or rename) your existing virtualenv directory (typically
bamboovenv
) - Redo the installation procedure (but no need to clone bamboo again), i.e.:
python -m venv bamboovenv
source (path to)/bamboovenv/bin/activate
pip install (path to)/bamboo
pip install CP3SlurmUtils
- Re-install plotIt:
cd (path to)/plotIt/build-plotit
rm CMakeCache.txt
cmake -DCMAKE_INSTALL_PREFIX=$VIRTUAL_ENV ..
make -j4 install
Use
Obtaining the list of input files
The list of input files is specified using a "sample template" file, as config/samples_template.yml
.
You may sometimes want to use other lists using only a subset of those samples for testing stuff.
In order to have local copies of the nanoAOD datasets available at the computing sites where we run bamboo, we use Rucio to request replicas.
Typically, we create a Rucio container containing all the samples we need (for a given version of nanoAOD, say).
Then, we create a Rucio rule that (when accepted) will trigger the replication of those samples to the given site.
A script for managing the replica rules and syncing a Rucio container with the sample list config is available under scripts/manageRucio.py
.
Some example usage:
$ ./manageRucio.py --list-container --container user.kcormier:/Analyses/ttbbUL20NanoV9/USER
/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v1/NANOAODSIM
$ ./manageRucio.py --rule-status --rule ca2df7adc0154576a7934d102810cf4d
Id: ca2df7adc0154576a7934d102810cf4d
Account: kcormier
Scope: user.kcormier
Name: /Analyses/ttbbUL20NanoV9/USER
RSE Expression: T3_CH_PSI
Copies: 1
State: OK
Locks OK/REPLICATING/STUCK: 391/0/0
Comment: Local copy at T3 of UL nanoAOD samples for UZH ttbb analysis
...
$ ./manageRucio.py --sync --samples ../config/samples_template.yml --container user.kcormier:/Analyses/ttbbUL20NanoV9/USER
Note that a container can only be modified by the user it belongs to.
A script scripts/manageSampleList.py
can be used to validate the list of files, or to upgrade the sample versions (e.g. going from nanoAOD v8 to v9). Some examples:
$ ./manageSampleList.py --samples sample_template.yml --check --check-version --version RunIINanoAODv9
The following samples have unexpected campaign strings:
#######################################################
/SingleMuon/Run2016B-ver2_HIPM_UL2016_MiniAODv1_NanoAODv2-v1/NANOAOD --> should be HIPM_UL2016_MiniAODv2_NanoAODv9
/SingleElectron/Run2016B-ver2_HIPM_UL2016_MiniAODv1_NanoAODv2-v1/NANOAOD --> should be HIPM_UL2016_MiniAODv2_NanoAODv9
The following samples could not be found in DAS:
################################################
/TTToThisIsWrong_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v2/NANOAODSIM
The following samples have a wrong version ('-vX'):
###################################################
/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v2/NANOAODSIM --> should be v1
Running:
$ ./manageSampleList.py --version RunIINanoAODv8 --change-version-to RunIINanoAODv9 > upgrade_v8_v9.sed
$ sed -f upgrade_v8_v9.sed sample_template.yml > new_template.yml
will yield a new sample template with the new sample names, however this will not always work and some manual editing will still be necessary due to inconsistencies in the sample naming. Running a check as above is therefore always required.
Producing histograms and plots
Histograms and/or plots are produced by running small modules that inherit from the NanoAODHistoModule
base class defined in bamboo. For instance, to produce control plots, use controlPlotter.py
as below. In there the selection and the plots are defined, using definitions (for objects, etc.) included in python/definitions.py
and python/controlPlotDefinition.py
.
To run on slurm, move to the python
directory and run:
bambooRun -m controlPlotter.py ../config/analysis.yml -o ../test/myPlots --samples ../config/samples_template.yml --distributed driver
For a one-time test when developing (i.e., testing your code runs fine before launching many jobs), simple remove --distributed driver
and use instead --test
. To run with systematics, add -s
(have a look at the available options using --help
). To redo only the plots starting from already produced histograms, do --onlypost
.
Job failures
If only a few jobs fail (e.g. they go over time or have network issues), bamboo will not run the postprocessing (merging output files and produce plots).
After all the other jobs are done, it will print a slurm command to run to resubmit only the failed jobs. You can then wait until these finish (successfully), use squeue -u ${USER}
or sacct
to check.
When they are finished, re-run the same bamboo command but replace --distributed driver
by --distributed finalize
. This will finalize the merging and run the postprocessing step.
List of current plotters/skimmers
To avoid duplicating code and reduce the possiblity of errors we should try to keep definitions shared between modules.
- controlPlotter.py: For basic control plots at various stages of the selection
- genTtbbPlotter.py: Only to run on tt/ttbb samples: make gen-level plots and gen/reco-level comparisons
- bTagPlotter.py: To study the effect of b-tagging scale factors, compute b-tagging efficiencies, ...
- unfoldingPlotter.py: To produce all the histograms needed for unfolding
- genMatchingStudies.py: For signal studies using the origin of b jets
- syncSkimmer.py: simple skimmer for event-level synchronization with other groups
The following files define classes from which the above inherit (includes command-line options, object definitions, selections...):
- baseTtbbPlotter.py: Base class for all plotters
-
genBaseTtbbPlotter.py: Defines gen-level stuff, derives from
baseTtbbPlotter
. Gen-level studies only need to inherit from this class. -
recoBaseTtbbPlotter.py: Defines reco-level stuff, derives from
recoBaseTtbbPlotter
. Pure reco-level plots only need to inherit from this. Anything combining gen- and reco-level information (e.g. for unfolding) needs to derive from both classes.
The following files are used for common definitions:
- definitions.py: all reco-level object definitions, basic selections, scale factors etc.
- genDefinitions.py: gen-level object definitions, basic selections
- controlPlotDefinition.py: list of all control plots used by controlPlotter.py
- unfoldingDefinitions.py: list of all plots for unfolding, used by unfoldingPlotter.py. Includes reco-level observable, corresponding gen-level observable, and response matrices linking the two.
For bamboo development
(For bamboo "experts") If you plan to work on the bamboo code instead of only the analysis code, installing bamboo in editable mode can be useful to avoid re-running pip install --upgrade
each time there is a change: pip install -e ./path/to/bamboo
.