Tests for plots/training results are too easy to pass (or missing)
while updating the plotting code for !349 (merged) I realised that the plots in the unit tests for plotting currently look like this
I was thinking if plotting several lines on top of each other makes this test easy to pass even when quite some things are changed?
I'm aware of the fact that this test just checks if the function runs in general, but I propose we add an additional test (probably at the end of the training integration test) which compares the plots resulting from a training to some comparison plots. This way the results of the training are checked as well.
So we would have to make the training reproducible by setting a random seed and then add a test which compares the plots resulting from plotting_epoch_performance.py
to some comparison plots.
What do you think?