Umami merge requestshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests2022-03-09T13:43:25+01:00https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/451update 3d significance sv variable2022-03-09T13:43:25+01:00Manuel Guthupdate 3d significance sv variable## Summary
This MR introduces the following changes
* Updating jet training variable from `SV1_significance3d` to `SV1_correctSignificance3d` for r22
* Adding a check for `replaceLineInFile` if leading spaces stay same, if not a warnin...## Summary
This MR introduces the following changes
* Updating jet training variable from `SV1_significance3d` to `SV1_correctSignificance3d` for r22
* Adding a check for `replaceLineInFile` if leading spaces stay same, if not a warning is raised
* Allowing that no `cuts` are provided for samples in the preprocessing step
Relates to the following issues
* closes #141
## Conformity
- [x] [Changelog entry](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/changelog.md)
- [x] [Documentation](https://umami-docs.web.cern.ch)
- [x] [Development guidelines](https://umami-docs.web.cern.ch/setup/development/)
- [x] [Style guides](https://umami-docs.web.cern.ch/setup/development/good-practices/)https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/450Organising ToDos in sections2022-02-28T16:30:00+01:00Manuel GuthOrganising ToDos in sectionsThis MR organises the automated TODO issue in sections, divided into `general` TODOs and the TODOs related to newly introduced features in python 3.9 and 3.10.
The TODO related to `vr_overlap` is outdated since this can be realised usin...This MR organises the automated TODO issue in sections, divided into `general` TODOs and the TODOs related to newly introduced features in python 3.9 and 3.10.
The TODO related to `vr_overlap` is outdated since this can be realised using the Cuts in preprocessing or evaluationhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/445Splitting up resampling in separate files2022-02-24T11:30:42+01:00Manuel GuthSplitting up resampling in separate fileshttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/442New combine flavour method for PDF sampling (with shuffling)2022-02-23T18:46:18+01:00Alexander FrochNew combine flavour method for PDF sampling (with shuffling)This MR adds a new method of combining the resampled flavours in the PDF sampling method.This MR adds a new method of combining the resampled flavours in the PDF sampling method.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/441Fix scale dict combination2022-02-23T13:45:09+01:00Joschka BirkFix scale dict combination## Bug description
There is a small bug in the calculation of the shifting and scaling factors in the preprocessing.
The constantly updated `scale_dict` is always given the same weight in the combination of std and mean.
This means tha...## Bug description
There is a small bug in the calculation of the shifting and scaling factors in the preprocessing.
The constantly updated `scale_dict` is always given the same weight in the combination of std and mean.
This means that even though its information (mean and std of the variables) represents increasingly more jets each iteration, it's always combined with the same weight as before (which is then 50/50, I think).
This is fixed by increasing the number of jets represented in the `scale_dict` each iteration.
## Consequences
### When using the count method
It seems like this is not really a problem when using the count resampling method, since the chunks contain equal amounts of jets from all used classes (shuffling happens already in resampling).
### When using the pdf-resampling method
Here you can end up with a last chunk which is dominated by jets from one class. The result of that is that the final scale dict is kinda off from the actual values.
## Note
The scaling and shifting of the variables is just there to ensure that the different input variables have the same order of magnitude. So even if a training was performed with preprocessed files that were not perfectly normalised, it's all fine as long as the corresponding scaling is applied correctly when evaluating the NN.
Tagging @mguth @alfrochJoschka BirkJoschka Birkhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/440Adding variable plots for preprocessing stages2022-02-28T23:47:01+01:00Alexander FrochAdding variable plots for preprocessing stagesThis MR adds the following:
* New variable plots for the different preprocessing stages (resampling, scaling, final file)
* Unit/Integration tests are added.
* Documentation is updated.
Closes #1, #10, #61This MR adds the following:
* New variable plots for the different preprocessing stages (resampling, scaling, final file)
* Unit/Integration tests are added.
* Documentation is updated.
Closes #1, #10, #61Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/430Fixing all issues of darglint and removing unused functions2022-02-16T16:50:23+01:00Alexander FrochFixing all issues of darglint and removing unused functionsThis MR fixes all `darglint` issues and removes the `MakePlots` and `Plot_vars` fuctions (outdated). Related to #61 and #145
The file `umami/train_tools/Plotting.py` is now excluded from darglint due to issues of darglint with the sphi...This MR fixes all `darglint` issues and removes the `MakePlots` and `Plot_vars` fuctions (outdated). Related to #61 and #145
The file `umami/train_tools/Plotting.py` is now excluded from darglint due to issues of darglint with the sphinx documentation style in the docstrings
closes https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/issues/97 #145Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/419Updating packaging and small issues2022-02-15T12:55:12+01:00Manuel GuthUpdating packaging and small issuesThis MR updates some small issues with packaging and now switches to `pip` instead of `python setup.py install/develop`
closes #131This MR updates some small issues with packaging and now switches to `pip` instead of `python setup.py install/develop`
closes #131https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/417Adding increased code coverage2022-02-16T11:04:39+01:00Jackson BarrAdding increased code coveragehttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/409fixing binarizer for 2 class labels2022-04-19T17:02:19+02:00Maggie Chenfixing binarizer for 2 class labelsFixing the issue with label_binarize from sklearn. When there are only 2 classes for classification, label_binarize returns one single column vector instead of one column per class. The fix adds a dummy class label i.e -1, and removes th...Fixing the issue with label_binarize from sklearn. When there are only 2 classes for classification, label_binarize returns one single column vector instead of one column per class. The fix adds a dummy class label i.e -1, and removes the last column after the binarizer.https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/408Fixing parallel processing of categories in pdf sampling2022-02-09T16:22:24+01:00Alexander FrochFixing parallel processing of categories in pdf samplingThis MR fixes the parallel running of the pdf sampling method.This MR fixes the parallel running of the pdf sampling method.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/406Update docstrings in for resampling and add inline comments2022-02-09T16:28:02+01:00Alexander FrochUpdate docstrings in for resampling and add inline commentsThis MR updates the docstrings (not all) for the resampling methods. Also, a lot of inline comments for code explanation are added.This MR updates the docstrings (not all) for the resampling methods. Also, a lot of inline comments for code explanation are added.Alexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/404Adding docstring updates and inline comments2022-02-08T16:25:44+01:00Alexander FrochAdding docstring updates and inline commentsThis MR adds a bit more docstring updates and inline comments for the resamplingThis MR adds a bit more docstring updates and inline comments for the resamplingAlexander FrochAlexander Frochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/401Copy config bugfix2022-02-07T15:56:36+01:00Samuel Van StroudCopy config bugfixFix copy path (go up one dir) and ensure out dir exists before attempting write.
By default, overwrite existing configs, but warn the user about this.Fix copy path (go up one dir) and ensure out dir exists before attempting write.
By default, overwrite existing configs, but warn the user about this.https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/393Preparation cleanups2022-02-03T15:20:14+01:00Manuel GuthPreparation cleanups- Cleaning up doc strings
- adding debug option printing all loaded input files
- printing only once which sample is written out and not after each batch- Cleaning up doc strings
- adding debug option printing all loaded input files
- printing only once which sample is written out and not after each batchhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/392allowing preprocess config without !include2022-02-02T18:10:34+01:00Manuel Guthallowing preprocess config without !includerelated to !386related to !386https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/391Fix track masking in the scaling2022-02-04T10:49:20+01:00Samuel Van StroudFix track masking in the scalingMasking previously happened in a few different places, in a few different ways.
- Checking the first track variable for `NaN`, and building a mask from this. Using the first variable was unstable, as we are relying on it being `float`, w...Masking previously happened in a few different places, in a few different ways.
- Checking the first track variable for `NaN`, and building a mask from this. Using the first variable was unstable, as we are relying on it being `float`, which is not always the case. This meant the masking broke for the new samples in some cases.
- Elsewhere, we used a `tracks == 0` check, after running `np.nan_to_num`. This is also broken as there are some possible `int` vars (e.g. `JFVertexIndex`, `leptonID`) for which `0` is a valid value, and the default for padded tracks is `-1`.
This implements a solution for both problems, using the `valid` flag (which is designed for this use) where possible, and, if it is not available, defaulting to finding the first `float` variable and using a `NaN` check.
@mguth @alfrochhttps://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/387Fixing bug, missing tracks labels in preprocessed sample2022-02-01T13:41:50+01:00Stefano FranchellucciFixing bug, missing tracks labels in preprocessed sampleThis MR is addressing issue #132.
As mentioned in the issue, the problem was a hard-coded call to `"track_labels"` while with MR[!285](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/285) this should ...This MR is addressing issue #132.
As mentioned in the issue, the problem was a hard-coded call to `"track_labels"` while with MR[!285](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/285) this should be now specific to the track collection, thus `"track_labels"` -> `f"{tracks_name}_labels"`.
Closes #132
~"bug fix"https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/386Copy config files during pp2022-02-03T12:39:29+01:00Samuel Van StroudCopy config files during ppSo that the user can understand which settings were used to produce a training sample, we copy the configs as they are used during preprocessing to the output destination.
Closes #133So that the user can understand which settings were used to produce a training sample, we copy the configs as they are used during preprocessing to the output destination.
Closes #133https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/merge_requests/385doc string improvements2022-02-02T18:58:14+01:00Manuel Guthdoc string improvements