Switch to new training file format

Same as in salt Citing @svanstro :

We have discussed moving to a "new" training file format, which is actually just the TDD output format we know and love. The reasons for doing this are:

easier to work with the training files (e.g. running trainings with different variables from a single preprocessed file)

consistency of files (e.g. easy to plot variables in final training files, and train and test loops use the same dataloader)

storage size improvements (due to typed storage)

dataloader read performance improvements (due to above)

I am planning to make the switch soon in salt. The idea is to use the existing umami *-hybrid-resampled.h5 file, rather than *-hybrid-resampled_scaled_shuffled.h5. As far as I can tell, the resampled files are shuffled. Variable normalisation will be handled on the on the fly in the dataloaders, which has a negligible impact on speed. This update should then have full backward compatibility.

Apart from preprocessing itself the changes will mainly only have effect tf_tool/generators.py and a bit on configs (adding scaling to training) TODO:

create tf_tool/tddgenerators.py to perform scaling and organising the variables correctly on the fly to read training data form tdd-like file format

Edited Jul 19, 2023 by Ivan Oleksiyuk