This MR removes the hard-coded labels from the evaluation tools and cleans up the code concerning the evaluation.
Unit tests for the new/rewritten functions are provided.
Also, I tried running the integration tests for the preprocessing and the training. I updated the concerning test files according to the new behavior of the functions and calls.
In addition to the callbacks, the training metrics are now saved in an extra json file in the corresponding model folder with the name
history.json. The handling of this file, when it comes to plotting, is provided