Fix aux outputs by ensuring tensors are moved to cpu
It seems !152 (merged) broke the writing of default auxiliary tasks, in the case where training is on GPU. This MR adds a few .cpu() casts to ensure everything works as expected.
The pipeline tests should cover this code, but I guess the mock trainings are only ever done on CPU, so we don't encounter a problem.