Umami merge requestshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests2021-07-29T10:53:53+02:00https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/139added configurable name of tracks dataset2021-07-29T10:53:53+02:00Stefano Franchellucciadded configurable name of tracks datasetEdited `preprocessing.py` and `preprocessing_tools/Preparation.py` in order to make the tracks dataset name, previously hardcoded as "tracks". The idea was to add an optional argument to the parser `tracks_name` in order to select the da...Edited `preprocessing.py` and `preprocessing_tools/Preparation.py` in order to make the tracks dataset name, previously hardcoded as "tracks". The idea was to add an optional argument to the parser `tracks_name` in order to select the dataset name.https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/136Undersampling analysis2021-08-04T19:02:57+02:00Victor Hugo Ruelas RiveraUndersampling analysisCreate new UnderSamplingTemplate class that is used to prepare the training dataset. It makes sure that all the flavours distributions have the same shape as the b distribution.Create new UnderSamplingTemplate class that is used to prepare the training dataset. It makes sure that all the flavours distributions have the same shape as the b distribution.https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/149Implement PDF sampling class + tests2021-08-10T13:45:46+02:00Alexander FrochImplement PDF sampling class + testsThis MR adds the PDF Sampling class with the needed functions for the resampling. Also unit tests for the class are provided.
This MR is related to &1This MR adds the PDF Sampling class with the needed functions for the resampling. Also unit tests for the class are provided.
This MR is related to &1Preprocessing rewriteAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/150Updated preprocessing: sample preparation using new PrepareSample class2021-08-25T10:37:01+02:00Manuel GuthUpdated preprocessing: sample preparation using new PrepareSample classThis MR implements the first version of the new `PrepareSamples` class, making everything independent of hardcoded flavours.
Furthermore, it removes the merging step. The output h5 files of the sample preparation are now written out inc...This MR implements the first version of the new `PrepareSamples` class, making everything independent of hardcoded flavours.
Furthermore, it removes the merging step. The output h5 files of the sample preparation are now written out incrementally.
Finally, the MR carries over latest changes in the master branch.
related to &1Preprocessing rewritePhilipp GadowManuel GuthPhilipp Gadowhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/151Adding resampling class2021-08-25T15:33:25+02:00Manuel GuthAdding resampling classReimplementation of the resampling.
Using a Baseclass to be used with all sampling methods
related to &1
closes https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/issues/16
introduces new yaml tool where the loader...Reimplementation of the resampling.
Using a Baseclass to be used with all sampling methods
related to &1
closes https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/issues/16
introduces new yaml tool where the loader can deal with file includes
after this MR there are some follow ups which need to be done (wanted to make this available first that people can start with other implementations of resampling methods): #60 #61 #62Preprocessing rewritehttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/172Refactor UnderSamplingTemplate2021-09-28T09:53:11+02:00Victor Hugo Ruelas RiveraRefactor UnderSamplingTemplate- Refactors UnderSamplingTemplate to use the base class Resampling
- Renames UnderSamplingTemplate to ProbabilityRatioUnderSampling
- Updates docs- Refactors UnderSamplingTemplate to use the base class Resampling
- Renames UnderSamplingTemplate to ProbabilityRatioUnderSampling
- Updates docsPreprocessing rewriteVictor Hugo Ruelas RiveraVictor Hugo Ruelas Riverahttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/177Adding scaling to preprocessing and re-activate CI2021-10-13T15:49:06+02:00Alexander FrochAdding scaling to preprocessing and re-activate CIThis MR adds the Scaling and Write classes for the preprocessing. The iterations are still not supported. I will open an issue for that.
Also, the integration tests are adapted and activated.
Merge after !170
Closes #66
Closes #58This MR adds the Scaling and Write classes for the preprocessing. The iterations are still not supported. I will open an issue for that.
Also, the integration tests are adapted and activated.
Merge after !170
Closes #66
Closes #58Preprocessing rewriteAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/194Merging Preprocessing-Remake in Master2021-10-19T14:05:21+02:00Alexander FrochMerging Preprocessing-Remake in MasterPreprocessing rewriteAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/215Fixing RAM issues while calculating scale dict2021-10-29T11:46:18+02:00Alexander FrochFixing RAM issues while calculating scale dictThis MR adds the generator for the calculation of the scale dicts. Also, for the `Scaling` class, unit tests are added for the helper functions.This MR adds the generator for the calculation of the scale dicts. Also, for the `Scaling` class, unit tests are added for the helper functions.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/221Adding fix for Scaling2021-10-29T15:23:22+02:00Alexander FrochAdding fix for ScalingThis MR adds a fix to the calculation of the combined scale dict.This MR adds a fix to the calculation of the combined scale dict.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/206PDF sampling2021-11-01T14:49:28+01:00Maxence DraguetPDF samplingThis MR introduces the PDF sampling in the chain of preprocessing methods as well as slightly modifying the generators and the dataset creating/appending of the scaling and writing methods.
The PDF sampling approach is now fully iterat...This MR introduces the PDF sampling in the chain of preprocessing methods as well as slightly modifying the generators and the dataset creating/appending of the scaling and writing methods.
The PDF sampling approach is now fully iterative (reading/writing in chunk steps), though the method necessary to make it run with single files fully loaded in memory are still accessible (by manual changing the PDF sampling parameter `iterator` in the `Run` function to `False` - this is kept for debugging).
closes #62Maxence DraguetMaxence Draguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/222Fixing shaping in h5 in ApplyScaling2021-11-02T11:01:43+01:00Alexander FrochFixing shaping in h5 in ApplyScalingThis MR fixes the maxshape of the output file from the `ApplyScales` function.This MR fixes the maxshape of the output file from the `ApplyScales` function.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/223Scaling Bugfix2021-11-02T12:17:04+01:00Samuel Van StroudScaling BugfixFixes:
```
Traceback (most recent call last):
File "/unix/atlastracking/svanstroud/miniconda3/envs/umami/bin/preprocessing.py", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/unix/atlastracking/svanstroud/g...Fixes:
```
Traceback (most recent call last):
File "/unix/atlastracking/svanstroud/miniconda3/envs/umami/bin/preprocessing.py", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/unix/atlastracking/svanstroud/gnn-tagger/umami_r22/umami/umami/preprocessing.py", line 111, in <module>
Scaling.ApplyScales()
File "/unix/atlastracking/svanstroud/gnn-tagger/umami_r22/umami/umami/preprocessing_tools/Scaling.py", line 782, in ApplyScales
jets, tracks, labels, track_labels = next(scale_generator)
File "/unix/atlastracking/svanstroud/gnn-tagger/umami_r22/umami/umami/preprocessing_tools/Scaling.py", line 680, in scale_generator
x - tracks_scale_dict[var]["shift"],
UnboundLocalError: local variable 'x' referenced before assignment
```
Tagging @alfroch @mdraguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/230Correcting PDF sampling2021-11-05T14:14:47+01:00Maxence DraguetCorrecting PDF samplingCorrecting PDF resampling: indices selection in iterator approach bugged due to weights not being normalised before selection. This MR solves this and modifies slightly the indices selection per chunk: number to sample per chunk is propo...Correcting PDF resampling: indices selection in iterator approach bugged due to weights not being normalised before selection. This MR solves this and modifies slightly the indices selection per chunk: number to sample per chunk is proportional to the relative chunk weight to the full weight distribution (overall chunks).Maxence DraguetMaxence Draguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/228Fixing bugs in NN_tools and preprocessing.py2021-11-05T17:11:34+01:00Alexander FrochFixing bugs in NN_tools and preprocessing.pyThis MR adds the following:
For the `NN_tools.py`:
- Fixes the bug when loading dips in the `plotting_epoch_performance.py`. You need to define the CustomObjectScope to load the model correctly.
- Adds a small work-around if jets ...This MR adds the following:
For the `NN_tools.py`:
- Fixes the bug when loading dips in the `plotting_epoch_performance.py`. You need to define the CustomObjectScope to load the model correctly.
- Adds a small work-around if jets are only loaded from one file. If you request 300k jets, this is the number which is loaded before the cutting. To ensure you load enough jets before cutting, increase the number of requested jets by 15%.
- Cleaning up some function definitions and adding doc-strings, comments etc.
- Standardize some function input variable names.
- Adding the `variable_cuts` from the training config also for the validation (was only added for evaluation up till now).
- Move the loading of the files outside of the loop of the `calculate_metrics` function (Loading is very time consuming).
For the `preprocessing.py`:
- Adding some comments.
- Adding a little work around that all samples in the config are prepared if `--sample` is not given when calling the preparation step.
- Adding a small fix to docs.
For the unit tests:
- Updated unit test control plots.
- Fixing some naming issues.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/233Correcting PDF resampling2021-11-05T21:16:19+01:00Maxence DraguetCorrecting PDF resamplingCorrecting PDF resampling: Correcting custom jets selection for all resampling.Correcting PDF resampling: Correcting custom jets selection for all resampling.Maxence DraguetMaxence Draguethttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/205Include generator2021-11-08T14:54:25+01:00Tomke SchroerInclude generatorInclude a generator for batches of jets/tracks during preprocessing to avoid that too large datasets cannot be ladetInclude a generator for batches of jets/tracks during preprocessing to avoid that too large datasets cannot be ladetTomke SchroerTomke Schroerhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/234fixing shuffling of indices2021-11-08T16:46:25+01:00Manuel Guthfixing shuffling of indicesThere was a bug spotted by @svanstro in the preprocessing, such that the shuffling of two arrays is not the same when being called after each other
The implementation is not optimal, but should workThere was a bug spotted by @svanstro in the preprocessing, such that the shuffling of two arrays is not the same when being called after each other
The implementation is not optimal, but should workhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/237minor improvement in Writing files2021-11-08T20:25:30+01:00Manuel Guthminor improvement in Writing filesno need to load the full file and also in addition into a data frame to get the length of the array in the fileno need to load the full file and also in addition into a data frame to get the length of the array in the filehttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/236Fixing random seed2021-11-09T13:53:15+01:00Alexander FrochFixing random seedAlexander FrochAlexander Froch