Multiple Tracks datasets in preprocessing stage
Implementation of the option for storing more than one track dataset in the preprocessed samples. This could save processing time and disk space. This required some reworks at preprocessing, training and evaluation stages.
Preprocessing
The option tracks_name
in config files => tracks_names
now can be either a string or a list, but is treated as a list trough out the preprocessing chain. In all the steps now, when tracks are used, it is done a loop over all the tracks collections, looping on tracks_names
.
At the scaling step, the scale_dict
has now one keyword for every separate tracks collection. For tracks, the input variables lists in the .yaml
file are now read in the following way: track_train_variables
=> {tracks_name}_train_variables
The final .h5
file now, when tracks are used, has additional datasets (one per tracks collection), the naming is changing X_trk_train
=> X_{tracks_name}_train
Training and Evaluation
All the changes made are mostly due to the naming updates:
X_trk_train
=> X_{tracks_name}_train
and track_train_variables
=> {tracks_name}_train_variables
.
An additional option is added to the training config, tracks_name
, to select the tracks datasets to use for training/evaluation