Commit f7ab7394 authored by Victor Hugo Ruelas Rivera's avatar Victor Hugo Ruelas Rivera
Browse files

Merge branch 'master' into viruelas-add-preprocessed-hybrid-validation-option

parents dd473cd9 3a6f25f9
Pipeline #3366876 failed with stages
in 4 minutes and 31 seconds
......@@ -22,3 +22,5 @@ env/*
.vscode/*
!umami/tests/unit/**/*.png
python_install/
# ignoring insitute dependent parameter files
Preprocessing-parameters-*.yaml
\ No newline at end of file
......@@ -19,7 +19,7 @@ Note, that for running DL1 no tracks have to be stored in the output hybrid samp
Note that the training Variables for DL1r R21 and DL1r R22 are defined in [DL1r_Variables.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/configs/DL1r_Variables.yaml) and [DL1r_Variables_R22.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/configs/DL1r_Variables_R22.yaml).
Important note about the preprocessing: when using taus, it is not advised to use the `count` sampling method. Indeed, the tau statistics is much lower, which would result in throwing away far too many jets. Instead, use the `pdf` or the `probability_ratio` sampling methods to have an exact match of the jets flavour and a proportional distribution of taus.
Important note about the preprocessing: when using taus, it is not advised to use the `count` sampling method. Indeed, the tau statistics is much lower, which would result in throwing away far too many jets. Instead, use the `pdf` or the `probability_ratio` sampling methods to have an exact match of the jets flavour and a proportional distribution of taus.
If you don't want to process some samples yourself, you can use the already preprocessed samples uploaded to rucio in the datasets `user.mdraguet.dl1r.R21.PFlowJetsDemoSamples` for DL1r or `user.mdraguet.dl1d.R21.PFlowJetsDemoSamples` for DL1d (RNNIP replaced by DIPS). These datasets do not have taus included. Note that you need to download both the datasets and the associated dictionary with scaling factors (+ the dictionary of variable). There are two test samples available: an hybrid (ttbar + Z'-ext) and a Z'-ext solely. Each should be manually cut in 2 to get a test and validation file. The data comes from:
- mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_FTAG1.e6337_s3126_r10201_p4060
......@@ -69,7 +69,7 @@ zpext_test_files:
# Path to Variable dict used in preprocessing
var_dict: <path>/<to>/<variables>/DL1r_Variables.yaml
# Variables or Variable Headers to exclude from the tagger
# Variables or Variable Headers to exclude from the tagger
# but contained in <path>/<to>/<variables>/DL1r_Variables.yaml
exclude: null
......@@ -118,6 +118,9 @@ Validation_metrics_settings:
# Define which taggers should also be plotted
taggers_from_file: ["RNNIP", "DL1r"]
# Label for the freshly trained tagger
tagger_label: "DL1r"
# Enable/Disable atlas tag
UseAtlasTag: True
......@@ -176,7 +179,7 @@ Eval_parameters_validation:
- pt_btagJes:
operator: ">"
condition: 250000
validation_file:
- pt_btagJes:
operator: "<="
......@@ -233,13 +236,14 @@ The different options are briefly explained here:
| `LRR_min_lr` | Float | Optional | Lower bound on the learning rate. Default: 0.000001 |
| `Validation_metrics_settings` | None | Necessary | Plotting settings for the validation plots which are produced by the [plotting_epoch_performance.py](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/plotting_epoch_performance.py) script. |
| `taggers_from_file` | List | Optional | List of taggers that are available in the .h5 samples. The here given taggers are plotted as reference lines in the rejection per epoch plots. |
| `tagger_label` | String | Optional | Name for the legend of the freshly trained tagger for the rejection per epoch plots. |
| `trained_taggers` | Dict | Optional | A dict with local trained taggers which shall be plotted in the rejection per epoch plots. You need to provide a dict with a `path` and a `label`. The path is the path to the validation metrics .json file, where the rejections per epoch are saved. The `label` is the label which will be shown in the legend in the rejection per epoch plots. The `dipsReference` in the example here is just an internal naming. It will not be shown anywhere. |
| `UseAtlasTag` | Bool | Optional | Decide, if the ATLAS tag is printed at the top left of the plot. |
| `AtlasTag` | String | Optional | Main ATLAS tag which is right to "ATLAS" |
| `SecondTag` | String | Optional | Second line below the ATLAS tag |
| `plot_datatype` | String | Necessary | Datatype of the plots that are produced using the [plotting_epoch_performance.py](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/plotting_epoch_performance.py) script. |
| `Eval_parameters_validation` | None | Necessary | A dict where all important information for the training are defined. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `tagger` | List | Necessary | List of taggers used for comparison. This needs to be a list of string or a single string. The name of the taggers must be same as in the evaluation file. For example, if the DL1d probabilities in the test samples are called `DL1dLoose20210607_pb`, the name you need to add to the list is `DL1dLoose20210607`. |
| `frac_values_comp` | Dict | Necessary | Dict with the fraction values for the comparison taggers. For all flavour (except the main flavour), you need to add values here which add up to one. |
| `frac_values` | Dict | Necessary | Dict with the fraction values for the freshly trained tagger. For all flavour (except the main flavour), you need to add values here which add up to one. |
......@@ -255,7 +259,7 @@ Before starting the training, you need to set some paths for the umami package t
python setup.py install
```
Note that with the `install` setup, changes that are performed to the scripts after setup are not included! For development and usage of changes without resetup everything, use
Note that with the `install` setup, changes that are performed to the scripts after setup are not included! For development and usage of changes without resetup everything, use
```bash
source run_setup.sh
......@@ -269,7 +273,7 @@ After that, you can switch to the folder `umami/umami` and run the training, usi
train.py -c ${EXAMPLES}/DL1r-PFlow-Training-config.yaml
```
The results after each epoch will be saved to the `umami/umami/MODELNAME/` folder. The modelname is the name defined in the [DL1r-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/DL1r-PFlow-Training-config.yaml).
The results after each epoch will be saved to the `umami/umami/MODELNAME/` folder. The modelname is the name defined in the [DL1r-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/DL1r-PFlow-Training-config.yaml).
If you want instant performance checks of the model after each epoch during the training, you can use
......@@ -277,7 +281,7 @@ If you want instant performance checks of the model after each epoch during the
plotting_epoch_performance.py -c ${EXAMPLES}/DL1r-PFlow-Training-config.yaml
```
which will write out plots for the light- and c-rejection, accuracy and loss per epoch to `umami/umami/MODELNAME/plots/`. In this form, the performance measurements, like light- and c-rejection, will be recalculated using the working point, the `frac_values` value and the number of validation jets defined in the [DL1r-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/DL1r-PFlow-Training-config.yaml). If taus are used in the training too, they will be included in these plots.
which will write out plots for the light- and c-rejection, accuracy and loss per epoch to `umami/umami/MODELNAME/plots/`. In this form, the performance measurements, like light- and c-rejection, will be recalculated using the working point, the `frac_values` value and the number of validation jets defined in the [DL1r-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/DL1r-PFlow-Training-config.yaml). If taus are used in the training too, they will be included in these plots.
If you don't want to recalculate it, you can give the path to the existing dict with the option `--dict`. For example:
......
......@@ -62,7 +62,7 @@ zpext_test_files:
# Path to Variable dict used in preprocessing
var_dict: <path>/<to>/<variables>/Dips_Variables.yaml
exclude: []
exclude: null
# Values for the neural network
NN_structure:
......@@ -105,6 +105,9 @@ Validation_metrics_settings:
# Define which taggers should also be plotted
taggers_from_file: ["rnnip", "DL1r"]
# Label for the freshly trained tagger
tagger_label: "DIPS"
# Define which freshly trained taggers should be plotted
trained_taggers:
dipsReference:
......@@ -216,13 +219,14 @@ The different options are briefly explained here:
| `LRR_min_lr` | Float | Optional | Lower bound on the learning rate. Default: 0.000001 |
| `Validation_metrics_settings` | None | Necessary | Plotting settings for the validation plots which are produced by the [plotting_epoch_performance.py](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/plotting_epoch_performance.py) script. |
| `taggers_from_file` | List | Optional | List of taggers that are available in the .h5 samples. The here given taggers are plotted as reference lines in the rejection per epoch plots. |
| `tagger_label` | String | Optional | Name for the legend of the freshly trained tagger for the rejection per epoch plots. |
| `trained_taggers` | Dict | Optional | A dict with local trained taggers which shall be plotted in the rejection per epoch plots. You need to provide a dict with a `path` and a `label`. The path is the path to the validation metrics .json file, where the rejections per epoch are saved. The `label` is the label which will be shown in the legend in the rejection per epoch plots. The `dipsReference` in the example here is just an internal naming. It will not be shown anywhere. |
| `UseAtlasTag` | Bool | Optional | Decide, if the ATLAS tag is printed at the top left of the plot. |
| `AtlasTag` | String | Optional | Main ATLAS tag which is right to "ATLAS" |
| `SecondTag` | String | Optional | Second line below the ATLAS tag |
| `plot_datatype` | String | Necessary | Datatype of the plots that are produced using the [plotting_epoch_performance.py](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/plotting_epoch_performance.py) script. |
| `Eval_parameters_validation` | None | Necessary | A dict where all important information for the training are defined. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `tagger` | List | Necessary | List of taggers used for comparison. This needs to be a list of string or a single string. The name of the taggers must be same as in the evaluation file. For example, if the DL1d probabilities in the test samples are called `DL1dLoose20210607_pb`, the name you need to add to the list is `DL1dLoose20210607`. |
| `frac_values_comp` | Dict | Necessary | Dict with the fraction values for the comparison taggers. For all flavour (except the main flavour), you need to add values here which add up to one. |
| `frac_values` | Dict | Necessary | Dict with the fraction values for the freshly trained tagger. For all flavour (except the main flavour), you need to add values here which add up to one. |
......@@ -238,7 +242,7 @@ Before starting the training, you need to set some paths for the umami package t
python setup.py install
```
Note that with the `install` setup, changes that are performed to the scripts after setup are not included! For development and usage of changes without resetup everything, use
Note that with the `install` setup, changes that are performed to the scripts after setup are not included! For development and usage of changes without resetup everything, use
```bash
source run_setup.sh
......@@ -252,7 +256,7 @@ After that, you can switch to the folder `umami/umami` and run the training, usi
train.py -c ${EXAMPLES}/Dips-PFlow-Training-config.yaml
```
The results after each epoch will be saved to the `umami/umami/MODELNAME/` folder. The modelname is the name defined in the [Dips-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/Dips-PFlow-Training-config.yaml).
The results after each epoch will be saved to the `umami/umami/MODELNAME/` folder. The modelname is the name defined in the [Dips-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/Dips-PFlow-Training-config.yaml).
If you want instant performance checks of the model after each epoch during the training, you can use
......
......@@ -14,14 +14,6 @@ If you want to only run unit tests, this can be done via
pytest ./umami/tests/unit/ -v
```
???+ warning "local execution of unit tests"
the unit tests are currently failing when executing them all together, also documented in [issue #94](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/issues/94)
as a workaround please execute the different unit test sub-directories separately
e.g.
```
pytest ./umami/tests/unit/evaluation_tools -v
```
and the integration test similarly via
```bash
......
......@@ -87,6 +87,7 @@ Dips_prob_pb:
| `x_label` | String | Optional | Set the x-axis label. Default is "DNN Output" |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `yAxisIncrease` | Float | Optional | Increase the y-axis by a given factor. Mainly used to fit in the ATLAS Tag without cutting the lines of the plot. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Probability Comparison
Plotting the DNN probability output for different models. For example:
......@@ -137,6 +138,7 @@ Dips_prob_comparison_pb:
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `Ratio_Cut` | List | Optional | Two element list that gives the lower (first element) and upper (second element) y axis bound of the ratio plot below the main plot. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Scores
Plotting the b-tagging discriminant scores for the different jet flavors. For example:
......@@ -170,6 +172,7 @@ scores_Dips_ttbar:
| `AtlasTag` | String | Optional | The first line of text right behind the 'ATLAS'. |
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Scores Comparison
Plotting the b-tagging discriminant scores for the different jet flavors for different models in the same plot. For example:
......@@ -217,6 +220,7 @@ scores_Dips_ttbar_comparison:
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `Ratio_Cut` | List | Optional | Two element list that gives the lower (first element) and upper (second element) y axis bound of the ratio plot below the main plot. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### ROC Curves
Plotting the ROC Curves of the rejection rates against the b-tagging efficiency. For example:
......@@ -264,6 +268,7 @@ Dips_light_flavour_ttbar:
| `AtlasTag` | String | Optional | The first line of text right behind the 'ATLAS'. |
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Comparison ROC Curves (Double Rejection ROC)
Plotting the ROC Curves of two rejection rates against a efficiency. You need to define a model for each model/rejection pair. For example:
......@@ -315,12 +320,13 @@ Dips_Comparison_flavour_ttbar:
| `xmin` | Float | Optional | Set the minimum b efficiency in the plot (which is the xmin limit). |
| `ymax` | Float | Optional | The maximum y axis. |
| `WorkingPoints` | List | Optional | The specified WPs are calculated and at the calculated b-tagging discriminant there will be a vertical line with a small label on top which prints the WP. |
| `yAxisIncrease` | Float | Optional |Increase the y-axis by a given factor. Mainly used to fit in the ATLAS Tag without cutting the lines of the plot. |
| `yAxisIncrease` | Float | Optional |Increase the y-axis by a given factor. Mainly used to fit in the ATLAS Tag without cutting the lines of the plot. |
| `figsize` | List | Optional |A list of the width and hight of the plot.
| `UseAtlasTag` | Bool | Optional | Decide if the ATLAS Tag is printed in the upper left corner of the plot or not. |
| `AtlasTag` | String | Optional | The first line of text right behind the 'ATLAS'. |
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Saliency Maps
Plotting the Saliency Map of the model. For example:
......@@ -352,6 +358,7 @@ Dips_saliency_b_WP77_passed_ttbar:
| `AtlasTag` | String | Optional | The first line of text right behind the 'ATLAS'. |
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### pT vs Efficiency
Plot the b efficiency/c-rejection/light-rejection against the pT. For example:
......@@ -381,6 +388,8 @@ Dips_pT_vs_beff:
SecondTag: "\n$\\sqrt{s}=13$ TeV, PFlow Jets,\n$t\\bar{t}$ Test Sample"
yAxisAtlasTag: 0.9
yAxisIncrease: 1.3
labelFontSize: 12
legFontSize: 12
```
| Options | Data Type | Necessary/Optional | Explanation |
......@@ -404,10 +413,14 @@ Dips_pT_vs_beff:
| `AtlasTag` | String | Optional | The first line of text right behind the 'ATLAS'. |
| `SecondTag` | String | Optional | Second line (if its starts with `\n`) of text right below the 'ATLAS' and the AtlasTag. Don't add fc value here! Its automatically added also the WP. |
| `yAxisAtlasTag` | Float | Optional | y-axis position of the ATLAS Tag in parts of the y-axis (0: lower left corner, 1: upper left corner). |
| `yAxisIncrease` | Float | Optional |Increase the y-axis by a given factor. Mainly used to fit in the ATLAS Tag without cutting the lines of the plot. |
| `yAxisIncrease` | Float | Optional |Increase the y-axis by a given factor. Mainly used to fit in the ATLAS Tag without cutting the lines of the plot. |
| `ymin` | Float | Optional | Set the y axis minimum. Leave empty (=None) for automatically set border. |
| `ymax` | Float | Optional | Set the y axis maximum. Leave empty (=None) for automatically set border. |
| `alpha` | Float | Optional | The Alpha value of the plots. |
| `labelFontSize` | Int | Optional | Set the fontsize of the axis labels and ticks. |
| `legFontSize` | Int | Optional | Set the fontsize of the legend. |
| `Ratio_Cut` | List | List with the lower and upper y-limit which is to be set for the ratio plot. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Variable vs Efficiency
Plot the efficiencies of all flavours versus any variable (not just pT). The variables must be included in the results h5 files from the evaluation step.
......@@ -473,6 +486,7 @@ eff_vs_pt:
| `SecondTag` | String | Optional | Second line of text right below the 'ATLAS' and the AtlasTag. Don't add fc value nor efficiency here! They are automatically added to the third tag. |
| `ThirdTag` | String | Optional | Write this text on the upper left corner. Usually meant to indicate efficiency format (global or flat) and the tagger used (DIPS, DL1r, ...). The fc value and the b-jet efficiency are automatically added to this tag. |
| `Log` | bool | Optional | Whether to put the y-axis in log-scale. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Variable vs Efficiency Comparison
Plot the efficiencies of each flavours versus any variable (not just pT) for all listed models. The variables must be included in the results h5 files from the evaluation step.
......@@ -551,6 +565,7 @@ eff_vs_pt_small:
| `SecondTag` | String | Optional | Second line of text right below the 'ATLAS' and the AtlasTag. Don't add fc value nor efficiency here! They are automatically added to the third tag. |
| `ThirdTag` | String | Optional | Write this text on the upper left corner. Usually meant to indicate efficiency format (global or flat) and the tagger used (DIPS, DL1r, ...). The fc value and the b-jet efficiency are automatically added to this tag. |
| `Log` | bool | Optional | Whether to put the y-axis in log-scale. |
| `dpi` | Int | Optional | Set the DPI value for the plot. Default is 400 |
#### Scanning fractions - DEPRECATED
DEPRECATED: For DL1 with taus, the evaluation step of `evaluate.py` generated an extra h5 file giving the c/b, light, and tau rejection as a function of the c/b-fraction and the tau fraction (this evaluation is no longer performed). To produce the plot associated to this information (2d heatmap of rejection for the two flavour fractions), add (for example) this to the plotting config:
......
......@@ -7,31 +7,31 @@ preprocess_config: examples/PFlow-Preprocessing.yaml
model_file:
# Add training file
train_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/preprocessed/PFlow-hybrid-preprocessed_shuffled.h5
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_file: <path_palce_holder>/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
# zprime val
add_validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
add_validation_file: <path_palce_holder>/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
ttbar_test_files:
ttbar_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r21"
ttbar_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r22"
zpext_test_files:
zpext_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r21"
zpext_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r22"
# Path to Variable dict used in preprocessing
......@@ -82,6 +82,9 @@ Validation_metrics_settings:
# Define which taggers should also be plotted
taggers_from_file: ["DL1r"]
# Label for the freshly trained tagger
tagger_label: "DL1r"
# Enable/Disable atlas tag
UseAtlasTag: True
......
# Set modelname and path to Pflow preprocessing config file
model_name: dips_conditional_attention_r21
preprocess_config: /home/fr/fr_fr/fr_af1100/b-Tagging/packages/umami/examples/PFlow-Preprocessing.yaml
preprocess_config: <path_palce_holder>/umami/examples/PFlow-Preprocessing.yaml
# Add here a pretrained model to start with.
# Leave empty for a fresh start
model_file:
# Add training file
train_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/preprocessed/PFlow-hybrid-preprocessed_shuffled.h5
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_file: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
# zprime val
add_validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
add_validation_file: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
ttbar_test_files:
ttbar_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r21"
ttbar_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r22"
zpext_test_files:
zpext_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r21"
zpext_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r22"
# Path to Variable dict used in preprocessing
var_dict: /home/fr/fr_fr/fr_af1100/b-Tagging/packages/umami/umami/configs/Dips_Variables.yaml
var_dict: <path_palce_holder>/umami/umami/configs/Dips_Variables.yaml
exclude: null
......@@ -101,6 +101,9 @@ Validation_metrics_settings:
# Define which taggers should also be plotted
taggers_from_file: ["rnnip", "DL1r"]
# Label for the freshly trained tagger
tagger_label: "CADS"
# Enable/Disable atlas tag
UseAtlasTag: True
......
# Set modelname and path to Pflow preprocessing config file
model_name: dips_lr_0.001_bs_15000_epoch_200_nTrainJets_Full
preprocess_config: /home/fr/fr_fr/fr_af1100/b-Tagging/packages/umami/examples/PFlow-Preprocessing.yaml
preprocess_config: <path_palce_holder>/umami/examples/PFlow-Preprocessing.yaml
# Add here a pretrained model to start with.
# Leave empty for a fresh start
model_file:
# Add training file
train_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/preprocessed/PFlow-hybrid-preprocessed_shuffled.h5
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_file: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
# zprime val
add_validation_file: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
add_validation_file: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
ttbar_test_files:
ttbar_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r21"
ttbar_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-r22_odd_100_PFlow-no_pTcuts-file_1.h5
data_set_name: "ttbar_r22"
zpext_test_files:
zpext_r21:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r21"
zpext_r22:
Path: /work/ws/nemo/fr_af1100-Training-Simulations-0/hybrids_r22/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
Path: <path_palce_holder>/MC16d_hybrid-r22-ext_odd_0_PFlow-no_pTcuts-file_1.h5
data_set_name: "zpext_r22"
# Path to Variable dict used in preprocessing
var_dict: /home/fr/fr_fr/fr_af1100/b-Tagging/packages/umami/umami/configs/Dips_Variables.yaml
var_dict: <path_palce_holder>/umami/umami/configs/Dips_Variables.yaml
exclude: null
......@@ -83,6 +83,9 @@ Validation_metrics_settings:
# Define which taggers should also be plotted
taggers_from_file: ["rnnip", "DL1r"]
# Label for the freshly trained tagger
tagger_label: "DIPS"
# Enable/Disable atlas tag
UseAtlasTag: True
......
parameters: !include Preprocessing-parameters-UCL-hits.yaml
.outlier_cuts: &outlier_cuts
- JetFitterSecondaryVertex_mass:
operator: <
condition: 25000
NaNcheck: True
- JetFitterSecondaryVertex_energy:
operator: <
condition: 1e8
NaNcheck: True
- JetFitter_deltaR:
operator: <
condition: 0.6
NaNcheck: True
# Defining yaml anchors to be used later, avoiding duplication
.cuts_template_ttbar_train: &cuts_template_ttbar_train
cuts:
- eventNumber:
operator: mod_6_<=
condition: 3
- pt_btagJes:
operator: "<="
condition: 2.5e5
- *outlier_cuts
.cuts_template_zprime_train: &cuts_template_zprime_train
cuts:
- eventNumber:
operator: mod_6_<=
condition: 3
- pt_btagJes:
operator: ">="
condition: 2.5e5
- *outlier_cuts
.cuts_template_validation: &cuts_template_validation
cuts:
- eventNumber:
operator: mod_6_==
condition: 4
- *outlier_cuts
.cuts_template_test: &cuts_template_test
cuts:
- eventNumber:
operator: mod_6_==
condition: 5
- *outlier_cuts
preparation:
ntuples:
zprime:
path: *ntuple_path
file_pattern: fullStats_dR0p04_withHitPos/*.h5
class_labels: [ujets, cjets, bjets]
samples:
training_zprime_bjets:
type: zprime
category: bjets
n_jets: 10e5
<<: *cuts_template_zprime_train
f_output:
path: *sample_path
file: MC16d-bjets_training_zprime_PFlow.h5
training_zprime_cjets:
type: zprime
category: cjets
# Number of c jets available in MC16d
n_jets: 10e5
<<: *cuts_template_zprime_train
f_output:
path: *sample_path
file: MC16d-cjets_training_zprime_PFlow.h5
training_zprime_ujets:
type: zprime
category: ujets
n_jets: 10e5
<<: *cuts_template_zprime_train
f_output:
path: *sample_path
file: MC16d-ujets_training_zprime_PFlow.h5
training_zprime_taujets:
type: zprime
category: taujets
n_jets: 10e5
<<: *cuts_template_zprime_train
f_output:
path: *sample_path
file: MC16d-taujets_training_zprime_PFlow.h5
validation_zprime:
type: zprime
category: inclusive
n_jets: 4e5
<<: *cuts_template_validation
f_output:
path: *sample_path
file: MC16d-inclusive_validation_zprime_PFlow.h5
testing_zprime:
type: zprime
category: inclusive
n_jets: 4e5
<<: *cuts_template_test
f_output:
path: *sample_path
file: MC16d-inclusive_testing_zprime_PFlow.h5
sampling:
method: count
# The options depend on the sampling method
options:
sampling_variables:
- pt_btagJes:
# bins take either a list containing the np.linspace arguments
# or a list of them
# For PDF sampling: must be the np.linspace arguments.
# - list of of list, one list for each category (in samples)
# - define the region of each category.
bins: [[0, 600000, 351], [650000, 6000000, 84]]
- absEta_btagJes:
# For PDF sampling: same structure as in pt_btagJes.
bins: [0, 2.5, 10]
samples:
zprime:
- training_zprime_bjets
- training_zprime_cjets
- training_zprime_ujets
# this optional option allows to specify the jets which should be used per sample
custom_njets_initial:
# these are empiric values ensuring a smooth hybrid sample
training_ttbar_bjets: 5.5e5
training_ttbar_cjets: 11.5e5
training_ttbar_ujets: 13.5e5
fractions:
zprime: 1.0
# For PDF sampling, this is the maximum upsampling rate (important to limit tau upsampling)
# File are referred by their key (as in custom_njets_initial)
max_upsampling_ratio:
training_zprime_cjets: 5
# number of training jets
# For PDF sampling: this is the number of target jets to be taken (through all categories).
# If set to -1: max out to target numbers (limited by fractions ratio)
njets: 25e5
save_tracks: True
tracks_name: "hits"
# this stores the indices per sample into an intermediate file
intermediate_index_file: indices.h5
# outputfiles are split into 5 -> needs to be implemented
iterations: 5
# Name of the output file from the preprocessing
outfile_name: *outfile_name
plot_name: PFlow_ext-hybrid
# Variable dict which is used for scaling and shifting
var_file: *var_file
# Dictfile for the scaling and shifting (json)
dict_file: *dict_file
# compression for final output files (null/gzip)
compression: null
# save final output files with specified precision
precision: float16
# Path where the ntuples are saved
ntuple_path: &ntuple_path /nfs/dust/atlas/user/pgadow/ftag/data/ntuple_links/
# Path where the hybrid samples will be saved
sample_path: &sample_path /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/hybrids/
# Path where the merged and ready-to-train samples are saved
file_path: &file_path /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/preprocessed/
# Name of the output file from the preprocessing
.outfile_name: &outfile_name /nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/output/PFlow-hybrid_70-test.h5
# Dictfile for the scaling and shifting (json)
.dict_file: &dict_file "/nfs/dust/atlas/user/pgadow/ftag/data/processed/20210525-defaulttracks/scale_dicts/PFlow-scale_dict-22M.json"