Setup
- Only possible on CC7
- Follow the instructions here: https://cp3.irmp.ucl.ac.be/~pdavid/bamboo/install.html
Important notes:
- Better to follow the instructions for "install from a local clone"
- Use the
LCG_98python3
environment (see below) - Use
python -m pip
instead ofpip
- Also install plotIt as described (inside the same virtual environment)
- You can ignore everything related to "SAMADhi"
- Install CP3SlurmUtils inside the virtual environment:
python -m pip install CP3SlurmUtils
- Also install python plotIt:
git clone git@github.com:pieterdavid/mplbplot.git -b py3compat
pip install -e ./mplbplot
- Make the configuration files available:
mkdir ~/.config/CP3SlurmUtils; ln -s (abs path to)/ttbbRun2Bamboo/config/defaults.cfg ~/.config/CP3SlurmUtils
ln -s (abs path to)/ttbbRun2Bamboo/config/bamboo.ini ~/.config/bamboorc
- Clone this repository,
cd ttbbRun2Bamboo
- The following needs to be run each time you want to use Bamboo, assuming you've followed the install procedure above (which includes the first two lines below, which do not need to be re-run):
source /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc10-opt/setup.sh
source (abs path to)/bamboovenv/bin/activate
export PYTHONPATH=(path to)/ttbbRun2Bamboo/python:$PYTHONPATH
Depending on how you read the data (if using xrootd
or DAS), you might also need to obtain a new proxy regulary (voms...
): this has to be done in a "clean" terminal session where the above has NOT been run. Currently this is not needed as we use dcap.
Upgrading
If you want to upgrade your bamboo installation, do (inside the bamboo install):
git pull upstream master
python -m pip install --upgrade .
If you want to upgrade the LCG release used (e.g. to profit from a more recent ROOT version), in a clean terminal:
- Source the corresponding LCG setup file
- Remove your existing virtualenv directory (typically
bamboovenv
) - Redo the installation procedure (but no need to clone bamboo again), i.e.:
python -m venv bamboovenv
source (path to)/bamboovenv/bin/activate
python -m pip install (path to)/bamboo
python -m pip install CP3SlurmUtils
- Re-install plotIt:
cd (path to)/plotIt/build-plotit
rm CMakeCache.txt
cmake -DCMAKE_INSTALL_PREFIX=$VIRTUAL_ENV ..
make -j4 install
Use
Obtaining the list of input files
First insert the list of crab output files into the samples configuration:
scripts/insertSamples.py -i config/samples_template.yml -o config/samples.yml (path to all your json files created by runPostCrab.py)
The "path to all json files" can be retrieved from here: https://gitlab.cern.ch/swertz/ttbbRun2Bamboo/wikis/Productions
Warning: be careful when modifying by hand the resulting output file, since it will be overwritten every time insertSamples.py
is ran! Always work with samples_template.yml
for persistent changes.
An alternative is to specify the path as an environment variable:
export SAMPLE_JSONS=(path to the folder containing all the json files)
It is then sufficient, when running e.g. the controlPlotter
, to specify one or more sample template files with the argument --samples
.
Producing histograms and plots
Histograms and/or plots are produced by running small modules that inherit from the NanoAODHistoModule
base class defined in bamboo. For instance, to produce control plots, use controlPlotter.py
as below. In there the selection and the plots are defined, using definitions (for objects, etc.) included in python/definitions.py
and python/controlPlotDefinition.py
.
To run on slurm, move to the python
directory and run:
bambooRun -m controlPlotter.py ../config/analysis.yml -o ../test/myPlots --samples ../config/samples_template.yml --distributed driver
For a one-time test when developing (i.e., testing your code runs fine before launching many jobs), simple remove --distributed driver
and use instead --test
. To run with systematics, add -s
(have a look at the available options using --help
). To redo only the plots starting from already produced histograms, do --onlypost
.
Job failures
If only a few jobs fail (e.g. they go over time or have network issues), bamboo will not run the postprocessing (merging output files and produce plots).
After all the other jobs are done, it will print a slurm command to run to resubmit only the failed jobs. You can then wait until these finish (successfully), use squeue -u ${USER}
or sacct
to check.
When they are finished, re-run the same bamboo command but replace --distributed driver
by --distributed finalize
. This will finalize the merging and run the postprocessing step.
List of current plotters/skimmers
To avoid duplicating code and reduce the possiblity of errors we should try to keep definitions shared between modules.
- controlPlotter.py: For basic control plots at various stages of the selection
- genTtbbPlotter.py: Only to run on tt/ttbb samples: make gen-level plots and gen/reco-level comparisons
- bTagPlotter.py: To study the effect of b-tagging scale factors, compute b-tagging efficiencies, ...
- unfoldingPlotter.py: To produce all the histograms needed for unfolding
- genMatchingStudies.py: For signal studies using the origin of b jets
- syncSkimmer.py: simple skimmer for event-level synchronization with other groups
The following files define classes from which the above inherit (includes command-line options, object definitions, selections...):
- baseTtbbPlotter.py: Base class for all plotters
-
genBaseTtbbPlotter.py: Defines gen-level stuff, derives from
baseTtbbPlotter
. Gen-level studies only need to inherit from this class. -
recoBaseTtbbPlotter.py: Defines reco-level stuff, derives from
recoBaseTtbbPlotter
. Pure reco-level plots only need to inherit from this. Anything combining gen- and reco-level information (e.g. for unfolding) needs to derive from both classes.
The following files are used for common definitions:
- definitions.py: all reco-level object definitions, basic selections, scale factors etc.
- genDefinitions.py: gen-level object definitions, basic selections
- controlPlotDefinition.py: list of all control plots used by controlPlotter.py
- unfoldingDefinitions.py: list of all plots for unfolding, used by unfoldingPlotter.py. Includes reco-level observable, corresponding gen-level observable, and response matrices linking the two.
For bamboo development
(For bamboo "experts") If you plan to work on the bamboo code instead of only the analysis code, this alternate installation procedure of bamboo can be useful to avoid re-installing using pip
each time there is a change:
source /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc10-opt/setup.sh
python -m venv bamboovenv
source bamboovenv/bin/activate
python -m pip install -e ./bamboo
cd bamboo/
python setup.py build # needs to be re-run if the C++ parts change!