Commit 644367de authored by Zhuoran Feng's avatar Zhuoran Feng
Browse files

Add new file

parent 38c982a0
Pipeline #2878153 failed with stages
in 19 seconds
# Umami for c-tagging development
The default Umami documentation is avaliable here:
[![Umami docs](https://img.shields.io/badge/info-documentation-informational)](https://umami-docs.web.cern.ch/umami-docs/). For umami framework, clone the original repository and make changes there -- I don't know why but everything goes wrong when I use my fork.
The sample preparation needs the [Training dataset dumper](https://gitlab.cern.ch/zfeng/training-dataset-dumper), if this doesn't work, use: [Original traning dataset dumper framework](https://gitlab.cern.ch/atlas-flavor-tagging-tools/training-dataset-dumper), remember to check the decorator files.
## Step 1: Convert root file to hdf5 file
### Setup
```bash
git clone ssh://git@gitlab.cern.ch:7999/atlas-flavor-tagging-tools/training-dataset-dumper.git
source training-dataset-dumper/setup-analysisbase.sh
mkdir build
cd build
cmake ../training-dataset-dumper
make
source build/x*/setup.sh
```
The next time you want to use the utility run from the project directory
```bash
source training-dataset-dumper/setup-analysisbase.sh
source build/x*/setup.sh
```
### Add new variables
New variables are defined using [DecoratorExample.hh](https://gitlab.cern.ch/zfeng/training-dataset-dumper/-/blob/master/BTagTrainingPreprocessing/src/DecoratorExample.hh) and [DecoratorExample.cxx](https://gitlab.cern.ch/zfeng/training-dataset-dumper/-/blob/master/BTagTrainingPreprocessing/src/DecoratorExample.cxx), for example, we want a new variable named "xxx_decorator":
```c++
// the constructor just builds the decorator
DecoratorExample::DecoratorExample(const std::string& prefix):
m_deco(prefix + "decorator")
```
After construct the new variables, edit [Line 121 of dump-single-btag.cxx](https://gitlab.cern.ch/zfeng/training-dataset-dumper/-/blob/master/BTagTrainingPreprocessing/util/dump-single-btag.cxx#L121) to make sure the variable is added to the dumper:
```c++
// this is just an example augmenter, it doesn't do anything important
// TODO: check here to modify the decorator
DecoratorExample example_decorator("ctag_");
```
The variable is finally named **ctag_decorator**. Then make sure it's contained in [single-btag-variables.json](https://gitlab.cern.ch/zfeng/training-dataset-dumper/-/blob/master/configs/single-b-tag/single-btag-variables.json#L70):
```json
"floats": [
...
"ctag_decorator"
]
```
### Run on the gird
From the directory where you checked out the package, after running the `setup.sh` script above, run the following:
```bash
source training-dataset-dumper/grid/setup.sh
```
Then edit [Input dataset in submit.sh](https://gitlab.cern.ch/zfeng/training-dataset-dumper/-/blob/master/grid/submit.sh#L38), run
```bash
./training-dataset-dumper/grid/submit.sh
```
## Step 2: Sample preparation
Use umami framework. The [manuel setup](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami#manual-setup) is a bit annoying, I used the docker image:
```bash
singularity exec docker://gitlab-registry.cern.ch/atlas-flavor-tagging-tools/algorithms/umami:latest bash
```
The full preprocessing part include preparation and preprocessing: [details](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/docs/preprocessing.md).
First add the new variable to the variable dictionary: [Umami_Variables.yaml](https://gitlab.cern.ch/zfeng/umami/-/blob/master/umami/configs/Umami_Variables.yaml#L30):
```yaml
train_variables:
JetFitterSecondaryVertex:
- ctag_decorator
```
Set up the [PFlow-Preprocessing.yaml](https://gitlab.cern.ch/zfeng/umami/-/blob/master/examples/PFlow-Preprocessing.yaml) for file paths and so on, then run:
```bash
# preparation
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_bjets --tracks --prepare
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_bjets --tracks --merge
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_cjets --tracks --prepare
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_cjets --tracks --merge
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_ujets --tracks --prepare
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample training_ttbar_ujets --tracks --merge
preprocessing.py --config examples/PFlow-Preprocessing.yaml --sample testing_ttbar --tracks --prepare
# preprocessing
# To exclude Zprime sample, I used my own changed preprocessing.py
# Otherwise can simply use the default executable one
python umami/preprocessing.py --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --undersampling --tracks
python umami/preprocessing.py --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --scaling --tracks
python umami/preprocessing.py --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --apply_scales --tracks
python umami/preprocessing.py --config examples/PFlow-Preprocessing.yaml --var_dict umami/configs/Umami_Variables.yaml --write --tracks
```
The training file should be in the `/preprocessed/` directory and named like `PFlow-hybrid-preprocessed_shuffled.h5`. Testing and validation files are in the `/hybrid/` directory, named `MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5`.
## Step 3: Training DL1r
For training and evaluation, copy python scripts from my [fork](https://gitlab.cern.ch/zfeng/umami/-/tree/master/umami)
A sample of the training config can be found here: [DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml](https://gitlab.cern.ch/zfeng/umami/-/blob/master/examples/DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml). For the training without added variable, put the new variable into `exclude`:
```yaml
exclude: ["ctag_decorator"]
```
For c-tagging, the python modules should be changed: use `NN_tools`, `NN_tools_bveto` and `Plotting` in my [train_tools](https://gitlab.cern.ch/zfeng/umami/-/tree/master/umami/train_tools) directory and import them into the training code `/umami/train_DL1.py` and `/umami/train_DL1_noadd.py` (or for training with b-veto: import `NN_tools_bveto` to `train_DL1_bveto.py` and `train_DL1_noadd_bveto.py`)
A quick check of the performance vs epochs:
```bash
python umami/plotting_epoch_performance.py -c examples/DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml --dl1
```
## Step 4: Evaluation
Same as training, import the python modules from my fork: [train_tools](https://gitlab.cern.ch/zfeng/umami/-/tree/master/umami/train_tools). Also use the python scripts from my fork.
To evaluate a specific model from epoch 80:
```bash
python umami/evaluate_model.py -c examples/DL1r-PFlow-Training-config_batch500_nodropout_simple2.yaml -e 89 --dl1
```
The output of this process are two h5 files in the directory using the model name in the config: `batch500_nodropout_simple2/results/results-89.h5` and `batch500_nodropout_simple2/results/results-rej_per_eff-89.h5`.
Also, it's able to evaluate the training sample: use [evaluate_model_train.py](https://gitlab.cern.ch/zfeng/umami/-/blob/master/umami/evaluate_model_train.py).
To plot the outputs, edit the plotting_config: [plotting_umami_config_DL1r_forC_noadd.yaml](https://gitlab.cern.ch/zfeng/umami/-/blob/master/examples/plotting_umami_config_DL1r_forC_noadd.yaml), then run:
```bash
python umami/plotting_umami.py -c examples/plotting_umami_config_DL1r_forC_noadd.yaml
```
For now, the only plotting types supporting c-tagging are: `scores`, `score_comparison`, `confusion_matrix` and `ROC`. Other plotting types can be found in [Umami docs/Plotting](https://umami-docs.web.cern.ch/umami-docs/plotting_umami/), but are by definition for b-tagging.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment