Commit c9f7f34d authored by Schröer Tomke's avatar Schröer Tomke
Browse files

solve conflict

parents d07e19e3 d8daf8a3
Pipeline #3154393 failed with stages
in 17 seconds
......@@ -6,8 +6,6 @@ The Umami documentation is avaliable here:
[![Umami docs](https://img.shields.io/badge/info-documentation-informational)](https://umami-docs.web.cern.ch/umami-docs/)
Below is included a brief summary on how to get started fast.
## Installation
......@@ -24,6 +22,20 @@ besides the CPU image, there is also a GPU image available which is especially u
singularity exec --nv docker://gitlab-registry.cern.ch/atlas-flavor-tagging-tools/algorithms/umami:latest-gpu bash
```
This image has Tensorflow installed for training the taggers. Another option is PyTorch. You can use it with this:
```bash
singularity exec --nv docker://gitlab-registry.cern.ch/atlas-flavor-tagging-tools/algorithms/umamibase:latest-pytorch-gpu bash
```
If you want to change something in the code (outside of config files), you need to run
```bash
source run_setup.sh
```
which sources the [run_setup.sh](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/run_setup.sh). Otherwise, the already in the image installed version of Umami is used.
### Manual setup
Alternatively you can also check out this repository via `git clone` and then run
......@@ -58,16 +70,17 @@ The test suite can be run via
pytest ./umami/tests/ -v
```
If you want to only run unit tests, this can be done via
If you want to only run, for example, the unit tests for the evaluation tools, this can be done via
```bash
pytest ./umami/tests/unit/ -v
pytest ./umami/tests/unit/evaluation_tools/ -v
```
and the integration test similarly via
To run the integration tests, you need to run them in the correct order: preprocessing, training, plotting.
Otherwise, you will get an error that some files are missing. You can run those via
```bash
pytest ./umami/tests/integration/ -v
pytest ./umami/tests/integration/test_preprocessing.py -v
```
In order to run the code style checker `flake8` use the following command
......@@ -76,6 +89,12 @@ In order to run the code style checker `flake8` use the following command
flake8 ./umami
```
## Preprocessing
For the training of umami the ntuples are used as specified in the section [MC Samples](#mc-samples).
The ntuples need to be preprocessed following the [preprocessing instructions](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/docs/preprocessing.md).
## DL1r instructions
If you want to train or evaluate DL1r please follow the [DL1r-instructions](docs/DL1r-instructions.md).
......@@ -83,9 +102,3 @@ If you want to train or evaluate DL1r please follow the [DL1r-instructions](docs
## DIPS instructions
If you want to train or evaluate DIPS please follow the [DIPS-instructions](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/docs/Dips-instructions.md)
## Preprocessing
For the training of umami the ntuples are used as specified in the section [MC Samples](#mc-samples).
The ntuples need to be preprocessed following the [preprocessing instructions](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/docs/preprocessing.md).
......@@ -19,13 +19,13 @@ The FTAG1 derivations and the most recent ntuples for PFlow with the new RNNIP,
## Release 22 Samples with Muons
The round 2 release 22 samples with RNNIP, DL1* and DIPS. Muon information are added (softMuon).
The round 2 release 22 samples with RNNIP, DL1* and DIPS. Muon information are added (softMuon). Information for GNN training is added.
| Sample | h5 ntuples | h5 ntuples (looser track selection) | DAOD_PHYSVAL derivations| AOD |
| ------------- | ---------------- | ---------------- | ---------------- | ---------------- |
| ttbar | user.alfroch.410470.btagTraining.e6337_e5984_s3126_r12629_p4724.EMPFlow.2021-09-20-T161046-R30966_output.h5 | user.alfroch.410470.btagTraining.e6337_e5984_s3126_r12629_p4724.EMPFlow_loose.2021-09-20-T165329-R29738_output.h5 | mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYSVAL.e6337_e5984_s3126_r12629_p4724 | mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_e5984_s3126_r12629
| Z' Extended (With QSP, Yes shower weights) | user.alfroch.800030.btagTraining.e7954_s3672_r12629_r12636_p4724.EMPFlow.2021-09-20-T161046-R30966_output.h5 | user.alfroch.800030.btagTraining.e7954_s3672_r12629_r12636_p4724.EMPFlow_loose.2021-09-20-T165329-R29738_output.h5 | mc16_13TeV.800030.Py8EG_A14NNPDF23LO_flatpT_Zprime_Extended.deriv.DAOD_PHYSVAL.e7954_s3672_r12629_r12636_p4724 | |
| Z' | user.alfroch.500567.btagTraining.e7954_e7400_s3672_r12629_r12636_p4724.EMPFlow.2021-09-20-T161046-R30966_output.h5 | user.alfroch.500567.btagTraining.e7954_e7400_s3672_r12629_r12636_p4724.EMPFlow_loose.2021-09-20-T165329-R29738_output.h5 | mc16_13TeV.500567.MGH7EG_NNPDF23ME_Zprime.deriv.DAOD_PHYSVAL.e7954_e7400_s3672_r12629_r12636_p4724 | mc16_13TeV.500567.MGH7EG_NNPDF23ME_Zprime.merge.AOD.e7954_e7400_s3672_r12629_r12636 |
| ttbar | user.alfroch.410470.btagTraining.e6337_e5984_s3126_r12629_p4724.EMPFlow.2021-10-18-T142151-R1757_output.h5 | user.alfroch.410470.btagTraining.e6337_e5984_s3126_r12629_p4724.EMPFlow_loose.2021-10-18-T142255-R29407_output.h5 | mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYSVAL.e6337_e5984_s3126_r12629_p4724 | mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_e5984_s3126_r12629
| Z' Extended (With QSP, Yes shower weights) | user.alfroch.800030.btagTraining.e7954_s3672_r12629_r12636_p4724.EMPFlow.2021-10-18-T142151-R1757_output.h5 | user.alfroch.800030.btagTraining.e7954_s3672_r12629_r12636_p4724.EMPFlow_loose.2021-10-18-T142255-R29407_output.h5 | mc16_13TeV.800030.Py8EG_A14NNPDF23LO_flatpT_Zprime_Extended.deriv.DAOD_PHYSVAL.e7954_s3672_r12629_r12636_p4724 | |
| Z' | user.alfroch.500567.btagTraining.e7954_e7400_s3672_r12629_r12636_p4724.EMPFlow.2021-10-18-T142151-R1757_output.h5 | user.alfroch.500567.btagTraining.e7954_e7400_s3672_r12629_r12636_p4724.EMPFlow_loose.2021-10-18-T142255-R29407_output.h5 | mc16_13TeV.500567.MGH7EG_NNPDF23ME_Zprime.deriv.DAOD_PHYSVAL.e7954_e7400_s3672_r12629_r12636_p4724 | mc16_13TeV.500567.MGH7EG_NNPDF23ME_Zprime.merge.AOD.e7954_e7400_s3672_r12629_r12636 |
## Release 22 Samples
......
This diff is collapsed.
......@@ -19,24 +19,20 @@ add_validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16
ttbar_test_files:
ttbar_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar"
data_set_name: "ttbar_r21"
ttbar_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_comparison"
data_set_name: "ttbar_r22"
zpext_test_files:
zpext_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext"
data_set_name: "zpext_r21"
zpext_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_comparison"
zpext_r22_no_QSP:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts_No_QSPI-file_1.h5
data_set_name: "zpext_comparison_no_QSP"
data_set_name: "zpext_r22"
# Path to Variable dict used in preprocessing
var_dict: umami/configs/DL1r_Variables.yaml
......@@ -101,6 +97,34 @@ Eval_parameters_validation:
"ujets": 0.982,
}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts: {
"ttbar_r21": {
"pt_btagJes": {
"operator": "<=",
"condition": 250000,
}
},
"ttbar_r22": {
"pt_btagJes": {
"operator": "<=",
"condition": 250000,
}
},
"zpext_r21": {
"pt_btagJes": {
"operator": ">",
"condition": 250000,
}
},
"zpext_r22": {
"pt_btagJes": {
"operator": ">",
"condition": 250000,
}
},
}
# A list to add available variables to the evaluation files
add_variables_eval: ["actualInteractionsPerCrossing"]
......
......@@ -19,24 +19,20 @@ add_validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16
ttbar_test_files:
ttbar_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar"
data_set_name: "ttbar_r21"
ttbar_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_comparison"
data_set_name: "ttbar_r22"
zpext_test_files:
zpext_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext"
data_set_name: "zpext_r21"
zpext_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_comparison"
zpext_r22_no_QSP:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts_No_QSPI-file_1.h5
data_set_name: "zpext_comparison_no_QSP"
data_set_name: "zpext_r22"
# Path to Variable dict used in preprocessing
var_dict: /home/fr/fr_fr/fr_af1100/b-Tagging/packages/umami/umami/configs/Dips_Variables.yaml
......@@ -100,6 +96,34 @@ Eval_parameters_validation:
"ujets": 0.982,
}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts: {
"ttbar_r21": {
"pt_btagJes": {
"operator": "<=",
"condition": 250000,
}
},
"ttbar_r22": {
"pt_btagJes": {
"operator": "<=",
"condition": 250000,
}
},
"zpext_r21": {
"pt_btagJes": {
"operator": ">",
"condition": 250000,
}
},
"zpext_r22": {
"pt_btagJes": {
"operator": ">",
"condition": 250000,
}
},
}
# Working point used in the evaluation
WP: 0.77
......
parameters:
# Path where the ntuples are saved
ntuple_path: &ntuple_path /nfs/dust/atlas/user/pgadow/ftag/data/ntuple_links/
# Path where the hybrid samples will be saved
sample_path: &sample_path /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/hybrids/
# Path where the merged and ready-to-train samples are saved
file_path: &file_path /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/preprocessed/
preparation:
ntuples:
ttbar:
path: *ntuple_path
file_pattern: user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow.2020-02-14-T232210-R26303_output.h5/*.h5
zprime:
path: *ntuple_path
file_pattern: user.mguth.427081.btagTraining.e6928_e5984_s3126_r10201_r10210_p3985.EMPFlow.2020-02-15-T225316-R8334_output.h5/*.h5
samples:
training_ttbar_bjets:
type: ttbar
category: bjets
n_jets: 10e6
n_split: 10
cuts:
eventNumber:
operator: mod_2_==
condition: 0
pt_cut:
operator: <=
condition: 2.5e5
HadronConeExclTruthLabelID:
operator: ==
condition: 5
f_output:
path: *sample_path
file: MC16d_hybrid-bjets_even_1_PFlow-merged.h5
merge_output: f_tt_bjets
training_ttbar_cjets:
type: ttbar
category: cjets
# Number of c jets available in MC16d
n_jets: 12745953
n_split: 13
cuts:
eventNumber:
operator: mod_2_==
condition: 0
pt_cut:
operator: <=
condition: 2.5e5
HadronConeExclTruthLabelID:
operator: ==
condition: 4
f_output:
path: *sample_path
file: MC16d_hybrid-cjets_even_1_PFlow-merged.h5
merge_output: f_tt_cjets
training_ttbar_ujets:
type: ttbar
category: ujets
n_jets: 20e6
n_split: 20
cuts:
eventNumber:
operator: mod_2_==
condition: 0
pt_cut:
operator: <=
condition: 2.5e5
HadronConeExclTruthLabelID:
operator: ==
condition: 0
f_output:
path: *sample_path
file: MC16d_hybrid-ujets_even_1_PFlow-merged.h5
merge_output: f_tt_ujets
training_ttbar_taujets:
type: ttbar
category: taujets
n_jets: 12745953
n_split: 5
cuts:
eventNumber:
operator: mod_2_==
condition: 0
pt_cut:
operator: <=
condition: 2.5e5
HadronConeExclTruthLabelID:
operator: ==
condition: 15
f_output:
path: *sample_path
file: MC16d_hybrid-taujets_even_1_PFlow-merged.h5
merge_output: f_tt_taujets
training_zprime:
type: zprime
n_jets: 9593092
n_split: 2
cuts:
eventNumber:
operator: mod_2_==
condition: 0
pt_cut:
operator: ">"
condition: 2.5e5
f_output:
path: *sample_path
file: MC16d_hybrid-ext_even_0_PFlow-merged.h5
merge_output: f_z
testing_ttbar:
type: ttbar
n_jets: 4e6
n_split: 2
cuts:
eventNumber:
operator: mod_2_==
condition: 1
f_output:
path: *sample_path
file: MC16d_hybrid_odd_100_PFlow-no_pTcuts.h5
testing_zprime:
type: zprime
n_jets: 4e6
n_split: 2
cuts:
eventNumber:
operator: mod_2_==
condition: 1
f_output:
path: *sample_path
file: MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts.h5
# amount of b-jets from ttbar which are used in the pre-resampling,
# can change after applying resampling in the hybrid sample creation
njets: 5.5e6
# fraction of ttbar jets wrt. Z'
# can change after applying resampling in the hybrid sample creation
ttbar_frac: 0.65
# Whether or not to enforce the ttbar fraction above
enforce_ttbar_frac: False
# outputfiles are split into 5
# iterations: 1
iterations: 5
# pT cut for hybrid creation (for light and c-jets)
pTcut: 2.5e5
# pT cut for b-jets
bhad_pTcut: 2.5e5
# upper pT limit for all jets
pT_max: False
# set to true if taus are to be included in preprocessing
bool_process_taus: False
# set to true if extended flavour labelling scheme is used in preprocessing
bool_extended_labelling: False
# Define undersampling method used. Valid are "count", "weight",
# "count_bcl_weight_tau", "template_b" and "template_b_count"
# count_bcl_weight_tau is a hybrid of count and weight to deal with taus.
# template_b uses the b as the target distribution, but does not guarantee
# same fractions. template_b_count will guarantee same fractions.
# The "template_b" and "template_b_count" does not work well with taus as of now.
# Additionally, when computing the target distribution for "template_b" and "template_b_count",
# pT_max (set above) will be used to compute the probability ratios or PDFs.
# Default is "count".
# See RunUndersampling in preprocessing for more info
sampling_method: count
# Name of the output files for the different jets
# ZPrime (Mixed)
f_z:
path: *file_path
file: MC16d_hybrid-ext_even_0_PFlow-merged.h5
# ttbar (b)
f_tt_bjets:
path: *file_path
file: MC16d_hybrid-bjets_even_1_PFlow-merged.h5
# ttbar (c)
f_tt_cjets:
path: *file_path
file: MC16d_hybrid-cjets_even_1_PFlow-merged.h5
# ttbar (u)
f_tt_ujets:
path: *file_path
file: MC16d_hybrid-ujets_even_1_PFlow-merged.h5
# ttbar (tau)
f_tt_taujets:
path: *file_path
file: MC16d_hybrid-taujets_even_1_PFlow-merged.h5
# Name of the output file from the preprocessing
outfile_name: /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/output/PFlow-hybrid_70-test.h5
plot_name: PFlow_ext-hybrid
# Dictfile for the scaling and shifting (json)
dict_file: "/nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/scale_dicts/PFlow-scale_dict-22M.json"
# cut definitions to be applied to remove outliers
# possible operators: <, ==, >, >=, <=
cuts:
JetFitterSecondaryVertex_mass:
operator: <
condition: 25000
NaNcheck: True
JetFitterSecondaryVertex_energy:
operator: <
condition: 1e8
NaNcheck: True
JetFitter_deltaR:
operator: <
condition: 0.6
NaNcheck: True
softMuon_pt:
operator: <
condition: 0.5e9
NaNcheck: True
softMuon_momentumBalanceSignificance:
operator: <
condition: 50
NaNcheck: True
softMuon_scatteringNeighbourSignificance:
operator: <
condition: 600
NaNcheck: True
<<<<<<< HEAD
parameters: !include Preprocessing-settings-Geneva.yaml
=======
parameters: !include Preprocessing-parameters.yaml
>>>>>>> d8daf8a34ee4fc11e5f8ed9944c7db33a435d2fa
# Defining yaml anchors to be used later, avoiding duplication
.cuts_template_ttbar_train: &cuts_template_ttbar_train
......@@ -7,7 +11,7 @@ parameters: !include Preprocessing-settings-Geneva.yaml
operator: mod_6_<=
condition: 3
- pt_btagJes:
operator: <=
operator: "<="
condition: 2.5e5
.cuts_template_zprime_train: &cuts_template_zprime_train
......@@ -16,7 +20,7 @@ parameters: !include Preprocessing-settings-Geneva.yaml
operator: mod_6_<=
condition: 3
- pt_btagJes:
operator: <=
operator: ">="
condition: 2.5e5
.cuts_template_validation: &cuts_template_validation
......
# Path where the ntuples are saved
ntuple_path: &ntuple_path /srv/beegfs/scratch/groups/dpnc/atlas/FTag/samples/r21/Loose
ntuple_path: &ntuple_path /srv/beegfs/scratch/groups/dpnc/atlas/FTag/samples/r21/Loose
# Path where the hybrid samples will be saved
sample_path: &sample_path /srv/beegfs/scratch/users/s/schroert/Internship_geneva/Hybrids
......
......@@ -122,11 +122,10 @@ confusion_matrix_Dips_ttbar:
Dips_saliency_b_WP77_passed_ttbar:
type: "saliency"
data_set_name: "ttbar"
plot_settings:
data_set_name: "ttbar"
title: "Saliency map for $b$ jets from \n $t\\bar{t}$ who passed WP = 77% \n with exactly 8 tracks"
target_beff: 0.77
# u=0, c=1, b=2
jet_flavour: "cjets"
PassBool: True
FlipAxis: True
......
......@@ -99,6 +99,22 @@ Eval_parameters_validation:
},
}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts: {
"ttbar": {
"pt_btagJes": {
"operator": "<=",
"condition": 250000,
}
},
"zpext": {
"pt_btagJes": {
"operator": ">",
"condition": 250000,
}
},
}
# Working point used in the evaluation
WP: 0.77
......
......@@ -10,6 +10,7 @@ unittest:
- pytest ./umami/tests/unit/evaluation_tools -v
- pytest ./umami/tests/unit/train_tools -v
- pytest ./umami/tests/unit/input_vars_tools -v
- pytest ./umami/tests/unit/tf_tools -v
rules:
- if: $CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH && $CI_PIPELINE_SOURCE != "merge_request_event"
......@@ -58,3 +59,9 @@ unittest_input_vars_tools:
script:
- pytest --cov=./ --cov-report= ./umami/tests/unit/input_vars_tools/ -v --junitxml=report.xml
- cp .coverage coverage_files/.coverage.unittest_input_vars_tools
unittest_tf_tools:
<<: *unittest_template
script:
- pytest --cov=./ --cov-report= ./umami/tests/unit/tf_tools/ -v --junitxml=report.xml
- cp .coverage coverage_files/.coverage.unittest_tf_tools
......@@ -26,7 +26,7 @@ flavour_categories:
label_var: HadronConeExclTruthLabelID
label_value: 0
colour: "#2ca02c"
legend_label: light-jets
legend_label: light-flavour jets
prob_var_name: "pu"
taujets:
label_var: HadronConeExclTruthLabelID
......
......@@ -87,6 +87,12 @@ def EvaluateModel(
class_labels = train_config.NN_structure["class_labels"]
main_class = train_config.NN_structure["main_class"]
frac_values_comp = Eval_params["frac_values_comp"]
var_cuts = (
Eval_params["variable_cuts"][f"{data_set_name}"]
if "variable_cuts" in Eval_params
and Eval_params["variable_cuts"] is not None
else None
)
# Init the placeholder lists for tagger_names
tagger_names = []
......@@ -132,6 +138,7 @@ def EvaluateModel(
class_labels=class_labels,
nJets=nJets,
exclude=exclude,
cut_vars_dict=var_cuts,
)
# Load the model for evaluation. Note: The Sum is needed here!
......@@ -172,6 +179,7 @@ def EvaluateModel(
class_labels=class_labels,
nJets=nJets,
variables=variables,
cut_vars_dict=var_cuts,
)
# Get the discriminant values and probabilities of each tagger
......@@ -262,6 +270,12 @@ def EvaluateModelDips(
class_labels = train_config.NN_structure["class_labels"]
main_class = train_config.NN_structure["main_class"]
frac_values_comp = Eval_params["frac_values_comp"]
var_cuts = (
Eval_params["variable_cuts"][f"{data_set_name}"]
if "variable_cuts" in Eval_params
and Eval_params["variable_cuts"] is not None
else None
)
# Set number of nJets for testing
nJets = int(Eval_params["n_jets"]) if not args.nJets else args.nJets
......@@ -284,6 +298,7 @@ def EvaluateModelDips(
preprocess_config=preprocess_config,
class_labels=class_labels,