Commit 644367de authored by Zhuoran Feng's avatar Zhuoran Feng
Browse files

Add new file

parent 38c982a0
Pipeline #2878153 failed with stages
in 19 seconds
# Umami for c-tagging development
The default Umami documentation is avaliable here:
[![Umami docs](]( For umami framework, clone the original repository and make changes there -- I don't know why but everything goes wrong when I use my fork.
The sample preparation needs the [Training dataset dumper](, if this doesn't work, use: [Original traning dataset dumper framework](, remember to check the decorator files.
## Step 1: Convert root file to hdf5 file
### Setup
git clone ssh://
source training-dataset-dumper/
mkdir build
cd build
cmake ../training-dataset-dumper
source build/x*/
The next time you want to use the utility run from the project directory
source training-dataset-dumper/
source build/x*/
### Add new variables
New variables are defined using [DecoratorExample.hh]( and [DecoratorExample.cxx](, for example, we want a new variable named "xxx_decorator":
// the constructor just builds the decorator
DecoratorExample::DecoratorExample(const std::string& prefix):
m_deco(prefix + "decorator")
After construct the new variables, edit [Line 121 of dump-single-btag.cxx]( to make sure the variable is added to the dumper:
// this is just an example augmenter, it doesn't do anything important
// TODO: check here to modify the decorator
DecoratorExample example_decorator("ctag_");
The variable is finally named **ctag_decorator**. Then make sure it's contained in [single-btag-variables.json](
"floats": [
### Run on the gird
From the directory where you checked out the package, after running the `` script above, run the following:
source training-dataset-dumper/grid/
Then edit [Input dataset in](, run
## Step 2: Sample preparation
Use umami framework. The [manuel setup]( is a bit annoying, I used the docker image:
singularity exec docker:// bash
The full preprocessing part include preparation and preprocessing: [details](
First add the new variable to the variable dictionary: [Umami_Variables.yaml](
- ctag_decorator
Set up the [PFlow-Preprocessing.yaml]( for file paths and so on, then run:
# preparation --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_bjets --tracks --prepare --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_bjets --tracks --merge --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_cjets --tracks --prepare --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_cjets --tracks --merge --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_ujets --tracks --prepare --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_ujets --tracks --merge --config examples/PFlow-Preprocessing.yaml --sample testing_ttbar --tracks --prepare
# preprocessing
# To exclude Zprime sample, I used my own changed
# Otherwise can simply use the default executable one
python umami/ --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --undersampling --tracks
python umami/ --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --scaling --tracks
python umami/ --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --apply_scales --tracks
python umami/ --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --write --tracks
The training file should be in the `/preprocessed/` directory and named like `PFlow-hybrid-preprocessed_shuffled.h5`. Testing and validation files are in the `/hybrid/` directory, named `MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5`.
## Step 3: Training DL1r
For training and evaluation, copy python scripts from my [fork](
A sample of the training config can be found here: [DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml]( For the training without added variable, put the new variable into `exclude`:
exclude: ["ctag_decorator"]
For c-tagging, the python modules should be changed: use `NN_tools`, `NN_tools_bveto` and `Plotting` in my [train_tools]( directory and import them into the training code `/umami/` and `/umami/` (or for training with b-veto: import `NN_tools_bveto` to `` and ``)
A quick check of the performance vs epochs:
python umami/ -c examples/DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml --dl1
## Step 4: Evaluation
Same as training, import the python modules from my fork: [train_tools]( Also use the python scripts from my fork.
To evaluate a specific model from epoch 80:
python umami/ -c examples/DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml -e 89 --dl1
The output of this process are two h5 files in the directory using the model name in the config: `batch500_nodropout_simple2/results/results-89.h5` and `batch500_nodropout_simple2/results/results-rej_per_eff-89.h5`.
Also, it's able to evaluate the training sample: use [](
To plot the outputs, edit the plotting_config: [plotting_umami_config_DL1r_forC_noadd.yaml](, then run:
python umami/ -c examples/plotting_umami_config_DL1r_forC_noadd.yaml
For now, the only plotting types supporting c-tagging are: `scores`, `score_comparison`, `confusion_matrix` and `ROC`. Other plotting types can be found in [Umami docs/Plotting](, but are by definition for b-tagging.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment