Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • Umami Umami
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 48
    • Issues 48
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • atlas-flavor-tagging-toolsatlas-flavor-tagging-tools
  • algorithmsalgorithms
  • UmamiUmami
  • Merge requests
  • !285

Multiple Tracks datasets in preprocessing stage

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Stefano Franchellucci requested to merge sfranche/umami:track-selection-prep into master Dec 01, 2021
  • Overview 121
  • Commits 87
  • Pipelines 174
  • Changes 42

Implementation of the option for storing more than one track dataset in the preprocessed samples. This could save processing time and disk space. This required some reworks at preprocessing, training and evaluation stages.

Preprocessing

The option tracks_name in config files => tracks_names now can be either a string or a list, but is treated as a list trough out the preprocessing chain. In all the steps now, when tracks are used, it is done a loop over all the tracks collections, looping on tracks_names.

At the scaling step, the scale_dict has now one keyword for every separate tracks collection. For tracks, the input variables lists in the .yaml file are now read in the following way: track_train_variables => {tracks_name}_train_variables

The final .h5 file now, when tracks are used, has additional datasets (one per tracks collection), the naming is changing X_trk_train => X_{tracks_name}_train

Training and Evaluation

All the changes made are mostly due to the naming updates:
X_trk_train => X_{tracks_name}_train and track_train_variables => {tracks_name}_train_variables.

An additional option is added to the training config, tracks_name, to select the tracks datasets to use for training/evaluation

Edited Dec 17, 2021 by Stefano Franchellucci
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: track-selection-prep