Commit ca9bd75b authored by Manuel Guth's avatar Manuel Guth
Browse files

Merge branch birk-generalise-validation-files with...

Merge branch birk-generalise-validation-files with refs/heads/112-flexible-validation-test-file-definition into refs/merge-requests/349/train
parents 6c9cfbef 157654c7
Pipeline #3488120 passed with stages
in 8 minutes and 48 seconds
......@@ -28,5 +28,4 @@ Preprocessing-parameters-*.yaml
# ignoring preprocessing integration test folders
preprocessing_*/
test_train_*/
# ignoring any test directory
test-*/
\ No newline at end of file
test_*_model*/
\ No newline at end of file
......@@ -42,11 +42,22 @@ model_file:
train_file: <path>/<to>/<train>/<samples>/train_file.h5
# Add validation files
# ttbar val
validation_file: <path>/<to>/<validation>/<samples>/ttbar_r21_validation_file.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path>/<to>/<validation>/<samples>/zpext_r21_validation_file.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......@@ -172,18 +183,6 @@ Eval_parameters_validation:
"ujets": 0.982,
}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts:
validation_file:
- pt_btagJes:
operator: "<="
condition: 250000
add_validation_file:
- pt_btagJes:
operator: ">"
condition: 250000
# A list to add available variables to the evaluation files
add_variables_eval: ["actualInteractionsPerCrossing"]
......@@ -229,8 +228,7 @@ The different options are briefly explained here:
| `preprocess_config` | String | Necessary | Path to the `preprocess_config` which was used to produce the training samples. |
| `model_file` | String | Optional | If you already have a model and want to continue the training of this model, you can give the path to this model here. This model will be loaded and used instead of init a new one. |
| `train_file` | String | Necessary | Path to the training sample. This is given by the `preprocessing` step of Umami |
| `validation_file` | String | Necessary | Path to the validation sample (ttbar). This is given by the `preprocessing` step of Umami |
| `add_validation_file` | String | Necessary | Path to the validation sample (zpext). This is given by the `preprocessing` step of Umami |
| `validation_files` | Dict | Optional | Here you can define different validation samples that are used in the training and the `plotting_epoch_performance.py` script. Those validation samples need to be defined in a dict structure shown in the example. The name of the dict entry is relevant and is the unique identifier for this sample (DO NOT USE IT MULTIPLE TIMES). `path` gives the path to the file. |
| `test_files` | Dict | Optional | Here you can define different test samples that are used in the [`evaluate_model.py`](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/evaluate_model.py). Those test samples need to be defined in a dict structure shown in the example. The name of the dict entry is relevant and is the unique identifier in the results file which is produced by the [`evaluate_model.py`](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/evaluate_model.py). `Path` gives the path to the file. For test samples, all samples from the training-dataset-dumper can be used without preprocessing although the preprocessing of Umami produces test samples to ensure orthogonality of the jets with respect to the train sample. |
| `var_dict` | String | Necessary | Path to the variable dict used in the `preprocess_config` to produce the train sample. |
| `exclude` | List | Necessary | List of variables that are excluded from training. Only compatible with DL1r training. To include all, just give an empty list. |
......
......@@ -35,11 +35,22 @@ model_file:
train_file: <path>/<to>/<train>/<samples>/train_file.h5
# Add validation files
# ttbar val
validation_file: <path>/<to>/<validation>/<samples>/ttbar_r21_validation_file.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path>/<to>/<validation>/<samples>/zpext_r21_validation_file.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......@@ -157,18 +168,6 @@ Eval_parameters_validation:
# Charm fraction value used for evaluation of the trained model
frac_values: {"cjets": 0.018, "ujets": 0.982}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts:
validation_file:
- pt_btagJes:
operator: "<="
condition: 250000
add_validation_file:
- pt_btagJes:
operator: ">"
condition: 250000
# Working point used in the evaluation
WP: 0.77
......@@ -186,8 +185,7 @@ The different options are briefly explained here:
| `preprocess_config` | String | Necessary | Path to the `preprocess_config` which was used to produce the training samples. |
| `model_file` | String | Optional | If you already have a model and want to continue the training of this model, you can give the path to this model here. This model will be loaded and used instead of init a new one. |
| `train_file` | String | Necessary | Path to the training sample. This is given by the `preprocessing` step of Umami |
| `validation_file` | String | Necessary | Path to the validation sample (ttbar). This is given by the `preprocessing` step of Umami |
| `add_validation_file` | String | Necessary | Path to the validation sample (zpext). This is given by the `preprocessing` step of Umami |
| `validation_files` | Dict | Optional | Here you can define different validation samples that are used in the training and the `plotting_epoch_performance.py` script. Those validation samples need to be defined in a dict structure shown in the example. The name of the dict entry is relevant and is the unique identifier for this sample (DO NOT USE IT MULTIPLE TIMES). `path` gives the path to the file. |
| `test_files` | Dict | Optional | Here you can define different test samples that are used in the [`evaluate_model.py`](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/evaluate_model.py). Those test samples need to be defined in a dict structure shown in the example. The name of the dict entry is relevant and is the unique identifier in the results file which is produced by the [`evaluate_model.py`](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/evaluate_model.py). `Path` gives the path to the file. For test samples, all samples from the training-dataset-dumper can be used without preprocessing although the preprocessing of Umami produces test samples to ensure orthogonality of the jets with respect to the train sample. |
| `var_dict` | String | Necessary | Path to the variable dict used in the `preprocess_config` to produce the train sample. |
| `exclude` | List | Necessary | List of variables that are excluded from training. Only compatible with DL1r training. To include all, just give an empty list. |
......
......@@ -10,11 +10,22 @@ model_file:
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: <path_palce_holder>/hybrids/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path_palce_holder>/hybrids/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......
......@@ -10,11 +10,22 @@ model_file:
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......
......@@ -10,11 +10,22 @@ model_file:
train_file: <path_palce_holder>/PFlow-hybrid-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......@@ -126,18 +137,6 @@ Eval_parameters_validation:
# Charm fraction value used for evaluation of the trained model
frac_values: {"cjets": 0.018, "ujets": 0.982}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts:
validation_file:
- pt_btagJes:
operator: "<="
condition: 250000
add_validation_file:
- pt_btagJes:
operator: ">"
condition: 250000
# Working point used in the evaluation
WP: 0.77
......
......@@ -10,11 +10,22 @@ model_file:
train_file: <path_palce_holder>/PFlow-hybrid_70-test-preprocessed_shuffled.h5
# Add validation files
# ttbar val
validation_file: <path_palce_holder>/MC16d_hybrid_odd_1_PFlow-no_pTcuts-file_0.h5
validation_files:
ttbar_r21_val:
path: <path_palce_holder>/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5
label: "$t\\bar{t}$ Release 21"
variable_cuts:
- pt_btagJes:
operator: "<="
condition: 250000
# zprime val
add_validation_file: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
zprime_r21_val:
path: <path_palce_holder>/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5
label: "$Z'$ Release 21"
variable_cuts:
- pt_btagJes:
operator: ">"
condition: 250000
test_files:
ttbar_r21:
......@@ -145,17 +156,5 @@ Eval_parameters_validation:
},
}
# Cuts which are applied to the different datasets used for evaluation
variable_cuts:
validation_file:
- pt_btagJes:
operator: "<="
condition: 250000
add_validation_file:
- pt_btagJes:
operator: ">"
condition: 250000
# Working point used in the evaluation
WP: 0.77
......@@ -293,6 +293,8 @@ def GetRejection(
main_class: str,
frac_dict: dict,
target_eff: float,
unique_identifier: str = None,
subtagger: str = None,
):
"""
Calculates the rejections for a specific WP for all provided jets
......@@ -317,6 +319,12 @@ def GetRejection(
except main_class.
target_eff : float
WP which is used for discriminant calculation.
unique_identifier: str
Unique identifier of the used dataset (e.g. ttbar_r21)
subtagger: str
String which describes the subtagger you calculate the rejection for in case
you have several involved. This will add the provided string to the key in
the dict, e.g. ujets_rej_<subtagger>_<file_id>
Returns
-------
......@@ -441,7 +449,14 @@ def GetRejection(
# Calculate efficiencies
for iter_main_class in class_labels_wo_main:
try:
rej_dict[f"{iter_main_class}_rej"] = 1 / (
if unique_identifier is None:
dict_key = f"{iter_main_class}_rej"
elif subtagger is None:
dict_key = f"{iter_main_class}_rej_{unique_identifier}"
else:
dict_key = f"{iter_main_class}_rej_{subtagger}_{unique_identifier}"
rej_dict[dict_key] = 1 / (
len(
jets_dict[iter_main_class][
CalcDiscValues(
......
......@@ -248,7 +248,8 @@ def TrainLargeFile(args, train_config, preprocess_config):
history = model.fit(
x=train_dataset,
epochs=nEpochs,
validation_data=(val_data_dict["X_valid"], val_data_dict["Y_valid"]),
# TODO: Add a representative validation dataset for training (shown in stdout)
# validation_data=(val_data_dict["X_valid"], val_data_dict["Y_valid"]),
callbacks=[dl1_mChkPt, reduce_lr, my_callback],
steps_per_epoch=nJets / NN_structure["batch_size"],
use_multiprocessing=True,
......
......@@ -299,10 +299,9 @@ def Dips(args, train_config, preprocess_config):
history = dips.fit(
train_dataset,
epochs=nEpochs,
validation_data=(val_data_dict["X_valid"], val_data_dict["Y_valid"]),
# TODO: Add a representative validation dataset for training (shown in stdout)
# validation_data=(val_data_dict["X_valid"], val_data_dict["Y_valid"]),
callbacks=[dips_mChkPt, reduce_lr, my_callback],
# callbacks=[reduce_lr, my_callback],
# callbacks=[my_callback],
steps_per_epoch=nJets / NN_structure["batch_size"],
use_multiprocessing=True,
workers=8,
......
......@@ -179,11 +179,13 @@ def DipsCondAtt(args, train_config, preprocess_config):
convert_to_tensor=True,
)
# TODO: Add a representative validation dataset for training (shown in
# stdout)
# Create the validation data tuple for the fit function
validation_data = (
val_data_dict["X_valid"],
val_data_dict["Y_valid"],
)
# validation_data = (
# val_data_dict["X_valid"],
# val_data_dict["Y_valid"],
# )
else:
val_data_dict = utt.load_validation_data_umami(
......@@ -194,14 +196,16 @@ def DipsCondAtt(args, train_config, preprocess_config):
jets_var_list=["absEta_btagJes", "pt_btagJes"],
)
# TODO: Add a representative validation dataset for training (shown in
# stdout)
# Create the validation data tuple for the fit function
validation_data = (
[
val_data_dict["X_valid_trk"],
val_data_dict["X_valid"],
],
val_data_dict["Y_valid"],
)
# validation_data = (
# [
# val_data_dict["X_valid_trk"],
# val_data_dict["X_valid"],
# ],
# val_data_dict["Y_valid"],
# )
# Set my_callback as callback. Writes history information
# to json file.
......@@ -223,7 +227,8 @@ def DipsCondAtt(args, train_config, preprocess_config):
history = dips.fit(
train_dataset,
epochs=nEpochs,
validation_data=validation_data,
# TODO: Add a representative validation dataset for training (shown in stdout)
# validation_data=validation_data,
callbacks=[dips_mChkPt, reduce_lr, my_callback],
steps_per_epoch=nJets / NN_structure["batch_size"],
use_multiprocessing=True,
......
......@@ -361,13 +361,14 @@ def Umami(args, train_config, preprocess_config):
history = umami.fit(
train_dataset,
epochs=nEpochs,
validation_data=(
[
val_data_dict["X_valid_trk"],
val_data_dict["X_valid"],
],
val_data_dict["Y_valid"],
),
# TODO: Add a representative validation dataset for training (shown in stdout)
# validation_data=(
# [
# val_data_dict["X_valid_trk"],
# val_data_dict["X_valid"],
# ],
# val_data_dict["Y_valid"],
# ),
callbacks=[umami_mChkPt, reduce_lr, my_callback],
steps_per_epoch=nJets / NN_structure["batch_size"],
use_multiprocessing=True,
......
......@@ -137,8 +137,6 @@ def prepareConfig(
config_file["model_name"] = data[f"test_{tagger}"]["model_name"]
config_file["preprocess_config"] = f"{preprocessing_config}"
config_file["train_file"] = f"{train_file}"
config_file["validation_file"] = f"{test_file_ttbar}"
config_file["add_validation_file"] = f"{test_file_zprime}"
# Erase all not used test files
del config_file["test_files"]
......@@ -165,10 +163,31 @@ def prepareConfig(
config_file["NN_structure"]["nJets_train"] = 100
config_file["Eval_parameters_validation"]["n_jets"] = 4000
config_file["Eval_parameters_validation"]["eff_min"] = 0.60
config_file["Eval_parameters_validation"]["variable_cuts"] = {
"validation_file": [{"pt_btagJes": {"operator": "<=", "condition": 250000}}],
"add_validation_file": [{"pt_btagJes": {"operator": ">", "condition": 250000}}],
}
# Add some validation files for testing
config_file.update(
{
"validation_files": {
"ttbar_r21_val": {
"path": (
f"{test_dir}/MC16d_hybrid_odd_100_PFlow-no_pTcuts-file_0.h5"
),
"label": "$t\\bar{t}$ Release 21",
"variable_cuts": [
{"pt_btagJes": {"operator": "<=", "condition": 250000}}
],
},
"zprime_r21_val": {
"path": (
f"{test_dir}/MC16d_hybrid-ext_odd_0_PFlow-no_pTcuts-file_0.h5"
),
"label": "$Z'$ Release 21",
"variable_cuts": [
{"pt_btagJes": {"operator": ">", "condition": 250000}}
],
},
}
}
)
if useTFRecords is True:
config_file["train_file"] = os.path.join(
......@@ -176,7 +195,6 @@ def prepareConfig(
"PFlow-hybrid_70-test-resampled_scaled_shuffled",
)
config_file["model_name"] = data["test_dips"]["model_name"] + "_tfrecords"
config_file["add_validation_file"] = None
config = config[:].replace(".yaml", "") + "_tfrecords.yaml"
......
......@@ -8,12 +8,14 @@ train_file: dummy.h5
# Add model file
model_file: dummy.h5
# Add validation files
# ttbar val
validation_file: dummy.h5
validation_files:
ttbar_r21_val:
path: dummy.h5
label: "$t\\bar{t}$ Release 21"
# zprime val
add_validation_file: dummy.h5
zprime_r21_val:
path: dummy.h5
label: "$Z'$ Release 21"
test_files:
ttbar_r21:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment