Umami merge requestshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests2021-12-15T15:24:11+01:00https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/310replacing .format with fstrings2021-12-15T15:24:11+01:00Manuel Guthreplacing .format with fstringshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/309moving functions into separate histogram tools2021-12-15T13:42:00+01:00Manuel Guthmoving functions into separate histogram toolsadding separate histogram tools in new module `umami.helper_tools`
extending testing of these functionsadding separate histogram tools in new module `umami.helper_tools`
extending testing of these functionshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/307Adding plotting upgrades2021-12-15T14:16:41+01:00Alexander FrochAdding plotting upgradesThis MR adds the following:
- New options for the `plotting_umami.py` plots:
- `dpi`: This sets the DPI value of the plots. Can be given in the config file in the `plotting_settings`. Documentation is added.
- `labelFontSize` for t...This MR adds the following:
- New options for the `plotting_umami.py` plots:
- `dpi`: This sets the DPI value of the plots. Can be given in the config file in the `plotting_settings`. Documentation is added.
- `labelFontSize` for the `pT_vs plots`: Setting the fontsize for the axes labels and ticks.
- `legFontSize` for the `pT_vs plots`: Setting the fontsize for the legend.
- Adding the option `tagger_label`. This is added in the train config in the `Validation_metrics_settings`. This is the label for the fresh trained model for the legend of the rejection vs epoch plots. If it is not given or None, the `tagger` from `NN_structure` is used (for backward compatibility). Documentation is added. Unit tests are adapted.
- Removing double loop function over files from the `input_vars_tools`. This feature is deprecated because it is already supported now by the `LoadTrksFromFile` and `LoadJetsFromFile` functions which are used here for loading.
- Adding variable `fontsize` to `makeATLAStag`.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/290running black on umami2021-12-06T14:12:51+01:00Frederic Rennerrunning black on umamihttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/285Multiple Tracks datasets in preprocessing stage2022-02-01T11:41:54+01:00Stefano FranchellucciMultiple Tracks datasets in preprocessing stageImplementation of the option for storing more than one track dataset in the preprocessed samples.
This could save processing time and disk space.
This required some reworks at preprocessing, training and evaluation stages.
##### Preproc...Implementation of the option for storing more than one track dataset in the preprocessed samples.
This could save processing time and disk space.
This required some reworks at preprocessing, training and evaluation stages.
##### Preprocessing
The option `tracks_name` in config files => `tracks_names` now can be either a string or a list, but is treated as a list trough out the preprocessing chain. In all the steps now, when tracks are used, it is done a loop over all the tracks collections, looping on `tracks_names`.
At the scaling step, the `scale_dict` has now one keyword for every separate tracks collection. For tracks, the input variables lists in the `.yaml` file are now read in the following way: `track_train_variables` => `{tracks_name}_train_variables`
The final `.h5` file now, when tracks are used, has additional datasets (one per tracks collection), the naming is changing `X_trk_train` => `X_{tracks_name}_train`
##### Training and Evaluation
All the changes made are mostly due to the naming updates:
`X_trk_train` => `X_{tracks_name}_train` and `track_train_variables` => `{tracks_name}_train_variables`.
An additional option is added to the training config, `tracks_name`, to select the tracks datasets to use for training/evaluationStefano FranchellucciStefano Franchelluccihttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/279plotting adaptions2021-11-30T13:33:09+01:00Manuel Guthplotting adaptionssome small adaption adding context manager and removing obsolete parameterssome small adaption adding context manager and removing obsolete parametershttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/274Add track origin selection for input variable plotting2021-11-29T10:50:30+01:00Sebastien RettieAdd track origin selection for input variable plottingThis MR provides the functionality to require a specific track origin when plotting input variables. The default behaviour of `plot_input_variables.py` does not change, except for appending a suffix `_All` to the plot name (so the refere...This MR provides the functionality to require a specific track origin when plotting input variables. The default behaviour of `plot_input_variables.py` does not change, except for appending a suffix `_All` to the plot name (so the reference image names have also been updated). If desired, the user can now specify a list of `track_origins` in the configuration file. This will produce a set of plots, one for each track origin specified. Note that the `truthOriginLabel` variable must be present in the `.h5` files used for plotting for this option to work.
The unit tests are currently failing because the x-axis labels are different to the reference plots (they now include the track origin in the x-axis title).Sebastien RettieSebastien Rettiehttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/271Multiple tagger in Rejection per Epoch plot2021-12-01T14:59:18+01:00Alexander FrochMultiple tagger in Rejection per Epoch plotThis MR adds the following:
- Fixing an artifact in the DIPS documentation (still a `--tracks` flag).
- Renaming the `Plotting_settings` from the train config files. These settings are only used for plotting the validation metrics plots...This MR adds the following:
- Fixing an artifact in the DIPS documentation (still a `--tracks` flag).
- Renaming the `Plotting_settings` from the train config files. These settings are only used for plotting the validation metrics plots (`plotting_epoch_performance.py`). They have a new name now `Validation_metrics_settings`.
- Inside the new `Validation_metrics_settings` there are two new options:
`taggers_from_file`: Like in the `Eval_parameters_validation`. Those are the names of the taggers that are available in the .h5 files. All taggers in this list will be plotted to the rejection per epoch plots as horizontal lines.
`trained_taggers`: Dict of taggers that are not in the .h5 files but their validation .json file (like for a local trained tagger) is available. Define them here and give the `path` to the .json file and a `label` for the legend. The taggers defined here are also going to be plotted into the rejection per epoch plots.
- Rejection per epoch plots are now also available with one rejection per plot. Those are automatically produced when running `plotting_epoch_performance.py`.
- Unit tests and documentation is provided for all changes.
- A bit of make-the-code-nice-again things.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/265Fix nJets_loaded in input vars plotting2021-11-24T11:55:55+01:00Alexander FrochFix nJets_loaded in input vars plottingThis MR fixes a bug in the loading of the jets in the `input_vars` plotting. Also some doublings are removed from the generatorsThis MR fixes a bug in the loading of the jets in the `input_vars` plotting. Also some doublings are removed from the generatorsAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/264Fixed bug in plotting script. Updated docs2021-11-23T17:07:32+01:00Joschka BirkFixed bug in plotting script. Updated docs- Updated docs (fixed some broken links and removed/updated deprecated parts)
- Fixed small bug in plotting script which occured when requesting more jets than there are available.- Updated docs (fixed some broken links and removed/updated deprecated parts)
- Fixed small bug in plotting script which occured when requesting more jets than there are available.https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/262Merge train executables and make the LR Reducer configurable2021-11-24T11:42:07+01:00Alexander FrochMerge train executables and make the LR Reducer configurableThis MR adds the following:
- Removed the executable part of the train scripts and created `train.py` which is now the executable.
- Added a flag in the `train_config` which has the tagger name in it. With this, the scripts can work wit...This MR adds the following:
- Removed the executable part of the train scripts and created `train.py` which is now the executable.
- Added a flag in the `train_config` which has the tagger name in it. With this, the scripts can work without relying on the args input which tagger was used. The args options is still there but not with `--dips` or `--dl1`. Thats now the `-t` or `--tagger` option.
- Changing the `--dips` etc. to `--tagger dips` for `plotting_epoch_performance` and `evaluate_model.py`.
- Added another flag in the `train_config` to make the Learning Rate Reducer configurable. You can set all the parameters in the `NN_structure` if you want to. Otherwise the current hard-coded values are used as default.
- Renamed and moved the old `train_*.py` with the models and all inside to a folder called `models`. There you can change everything you like (just to clean up and sort everything a bit).
- Adapted the unit/integration tests.
- Added the changes to the docs.
closes https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/issues/81Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/254Adding conditional attention2021-11-24T09:52:39+01:00Alexander FrochAdding conditional attentionThis MR adds the following:
- Adding new train script for DIPS Conditional Attention + example train config.
- Adding support for new DIPS Conditional Attention model in the validation/evaluation chain.
- Adding integration test for tra...This MR adds the following:
- Adding new train script for DIPS Conditional Attention + example train config.
- Adding support for new DIPS Conditional Attention model in the validation/evaluation chain.
- Adding integration test for train/validation/evaluation of DIPS Conditional Attention.
- Fixing some issues in the `tf_tools/models` with the masking (also adapted the unit test).
- Adding new generator for DIPS Conditional Attention.
- Adding some flexibility to the loading of the `loading_validation_data` functions.
- Adding compatibility of the `evaluate_model.py` for DIPS and DIPS Conditional Attention.
- Adding DIPS Conditional Attention to possible models for `plotting_epoch_performance.py`
- Adding DIPS Conditional Attention train script to the `setup.py`.
- Make the calculation of the Saliency maps steerable in the train config of the DIPS models.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/229Update tests and general bug fixes2021-11-11T10:09:02+01:00Alexander FrochUpdate tests and general bug fixesThis MR adds the following things:
- Adding `-s` option to integration tests, so the stdout of the tests is shown while they are running.
- Removing `Configuration` of the `evaluation_tools` -> Never used.
- Removing not-implemented fun...This MR adds the following things:
- Adding `-s` option to integration tests, so the stdout of the tests is shown while they are running.
- Removing `Configuration` of the `evaluation_tools` -> Never used.
- Removing not-implemented function from `evaluation_tools` (is covered by Maxence MR !190).
- Created new integration tests for the pdf sampling method. It runs in the `test_preprocessing` stage and is not further used till now -> Maybe add more integration tests for the taggers with these preprocessed samples later.
- Fixed long runtime of the `test_train_dl1r` -> Shapley still had default values which were way to big for testing purposes.
- Added unit tests for `tf_tools` and a small one for `evaluation_tools`.
- Moving the transformation of the `np.arrays` to `tf.tensors` to the `NN_tools` -> The `tf.tensors` fixed the memory issue for DIPS and Umami.
- Fixing the number of jets used in training (the `nJets_train` from the config were overwritten by the full size of the training file. One of the reasons the integration tests needed so much. The integration tests for training the taggers should be way faster now (from 17 Minutes to 3 Minutes).
- Overall improvement of the time needed for the whole pipeline: 30 minutes -> 10 minutes.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/228Fixing bugs in NN_tools and preprocessing.py2021-11-05T17:11:34+01:00Alexander FrochFixing bugs in NN_tools and preprocessing.pyThis MR adds the following:
For the `NN_tools.py`:
- Fixes the bug when loading dips in the `plotting_epoch_performance.py`. You need to define the CustomObjectScope to load the model correctly.
- Adds a small work-around if jets ...This MR adds the following:
For the `NN_tools.py`:
- Fixes the bug when loading dips in the `plotting_epoch_performance.py`. You need to define the CustomObjectScope to load the model correctly.
- Adds a small work-around if jets are only loaded from one file. If you request 300k jets, this is the number which is loaded before the cutting. To ensure you load enough jets before cutting, increase the number of requested jets by 15%.
- Cleaning up some function definitions and adding doc-strings, comments etc.
- Standardize some function input variable names.
- Adding the `variable_cuts` from the training config also for the validation (was only added for evaluation up till now).
- Move the loading of the files outside of the loop of the `calculate_metrics` function (Loading is very time consuming).
For the `preprocessing.py`:
- Adding some comments.
- Adding a little work around that all samples in the config are prepared if `--sample` is not given when calling the preparation step.
- Adding a small fix to docs.
For the unit tests:
- Updated unit test control plots.
- Fixing some naming issues.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/226Correcting eff vs var2021-11-05T00:29:37+01:00Maxence DraguetCorrecting eff vs varModification to the eff vs var plot. It is now compatible with the non-hard-coded labels and is capable of comparing several taggers thanks to a new comparison plotter.Modification to the eff vs var plot. It is now compatible with the non-hard-coded labels and is capable of comparing several taggers thanks to a new comparison plotter.Maxence DraguetMaxence Draguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/204Fixing Error messages + Docs2021-10-27T14:30:29+02:00Alexander FrochFixing Error messages + DocsThis MR adds:
1: Update for Docs to new preprocessing etc. (DIPS, Evaluate without trained model and plotting_umami).
2: Moving the confusion matrix plotting to the plot function and clean a bit the plotting configs.
3: Update the...This MR adds:
1: Update for Docs to new preprocessing etc. (DIPS, Evaluate without trained model and plotting_umami).
2: Moving the confusion matrix plotting to the plot function and clean a bit the plotting configs.
3: Update the unit test for `LoadJetsFromFile` and `LoadTrksFromFile`.
4: Improve the error message that is shown when file/files could not be found in the evaluation process.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/194Merging Preprocessing-Remake in Master2021-10-19T14:05:21+02:00Alexander FrochMerging Preprocessing-Remake in MasterPreprocessing rewriteAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/191Adding Comments from Scaling and updating documentation2021-10-21T11:19:51+02:00Alexander FrochAdding Comments from Scaling and updating documentationThis MR adds the following:
1: follow-ups from #68 and #69. Closes #68 and #69.
2: Adding the possibility to define cuts based on a variable (like in preprocessing) for the evaluation. When the samples (test samples) are loaded for e...This MR adds the following:
1: follow-ups from #68 and #69. Closes #68 and #69.
2: Adding the possibility to define cuts based on a variable (like in preprocessing) for the evaluation. When the samples (test samples) are loaded for evaluation, the cut is applied to the jets (only working with jet variables).
3: Update of the preprocessing documentation (preprocessing.md).
4: Fixes in the plotting code (Caps of labels, linewidth, etc.)Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/190Fraction Scan Plot Correction2021-11-10T09:59:45+01:00Maxence DraguetFraction Scan Plot CorrectionCells corresponding to fc+ + flight (+ftau) > 1 are displayed in grey now (equivalent for c tagging case).
This also corrects the model definition for DL1 (error introduced in last master updates) and removes the logging of the loading...Cells corresponding to fc+ + flight (+ftau) > 1 are displayed in grey now (equivalent for c tagging case).
This also corrects the model definition for DL1 (error introduced in last master updates) and removes the logging of the loading of training data by the generator.Maxence DraguetMaxence Draguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/170Removing hard-coded labels from evaluation tools2021-09-15T16:27:48+02:00Alexander FrochRemoving hard-coded labels from evaluation toolsThis MR removes the hard-coded labels from the evaluation tools and cleans up the code concerning the evaluation.
Unit tests for the new/rewritten functions are provided.
Also, I tried running the integration tests for the preproce...This MR removes the hard-coded labels from the evaluation tools and cleans up the code concerning the evaluation.
Unit tests for the new/rewritten functions are provided.
Also, I tried running the integration tests for the preprocessing and the training. I updated the concerning test files according to the new behavior of the functions and calls.
In addition to the callbacks, the training metrics are now saved in an extra json file in the corresponding model folder with the name `history.json`. The handling of this file, when it comes to plotting, is providedPreprocessing rewriteAlexander FrochAlexander Froch