Switch to new training file format
We have discussed moving to a "new" training file format, which is actually just the TDD output format we know and love. The reasons for doing this are:
- easier to work with the training files (e.g. running trainings with different variables from a single preprocessed file)
- consistency of files (e.g. easy to plot variables in final training files, and train and test loops use the same dataloader)
- storage size improvements (due to typed storage)
- dataloader read performance improvements (due to above)
I am planning to make the switch soon in salt. The idea is to use the existing umami *-hybrid-resampled.h5
file, rather than *-hybrid-resampled_scaled_shuffled.h5
. As far as I can tell, the resampled files are shuffled. Variable normalisation will be handled on the on the fly in the dataloaders, which has a negligible impact on speed. This update should then have full backward compatibility.
Tagging @pgadow @alfroch in case they have any comments or concerns.
Edited by Samuel Van Stroud