Switch to new training file format
Same as in salt Citing @svanstro :
We have discussed moving to a "new" training file format, which is actually just the TDD output format we know and love. The reasons for doing this are:
- easier to work with the training files (e.g. running trainings with different variables from a single preprocessed file)
- consistency of files (e.g. easy to plot variables in final training files, and train and test loops use the same dataloader)
- storage size improvements (due to typed storage)
- dataloader read performance improvements (due to above)
I am planning to make the switch soon in salt. The idea is to use the existing umami
*-hybrid-resampled.h5
file, rather than*-hybrid-resampled_scaled_shuffled.h5
. As far as I can tell, the resampled files are shuffled. Variable normalisation will be handled on the on the fly in the dataloaders, which has a negligible impact on speed. This update should then have full backward compatibility.
Apart from preprocessing itself the changes will mainly only have effect tf_tool/generators.py and a bit on configs (adding scaling to training) TODO:
-
create tf_tool/tddgenerators.py to perform scaling and organising the variables correctly on the fly to read training data form tdd-like file format
Edited by Ivan Oleksiyuk