Skip to content

Loosen throughput check for MI100

Rosen Matev requested to merge rm-ci-improvements into master

MI100 is now allowed to decrease by 7.5%/0.5=15% w.r.t. the reference. It also contributes with a weight of 0.5 to the average. This solves the issue that MI100 seem to be unstable and the throughput test is even failing on master.

Also includes some small cleanups:

  • Reduce verbosity when posting to telegraph fails.
  • Small cleanup of throughput publish job on the way.
  • Remove some unused files.

The bigger underlying issue (that is not solved here at all), is that the throughput measurements are not very stable. (This is also the reason for the comparatively large tolerances of 7.5% per device and 2.5% averaged over devices.) If this cannot be easily improved, we should think about committing the throughput references instead of taking them from the last (successful) job on master.

/cc @roneil @raaij @dovombru @dcampora

Edited by Rosen Matev

Merge request reports