the dips training integration test is fairly slow compared to the other tagger integration tests
e.g. here the dips training takes > 6min while the cond. attention dips training here only take 1min45