Throughput decrease checker; submit build $OPTIONS to grafana
Fail publish throughput
jobs and add a hlt1-throughput-decreased label if:
- The averaged throughput change goes below a certain value (see
.gitlab-ci.yml
, where this variable is set, currently -2.5%) - The throughput change for any device goes below a (looser) value of -7.5%
The throughput percentage change is calculated from the speedup, using this simple formula
change = (speedup - 1.0) * 100.0
i.e. a speedup of 0.96x translates to a -4% change.
cc @dovombru
Merge request reports
Activity
added only GitLab CI label
assigned to @roneil
- Resolved by Ryunosuke O'Neil
- Resolved by Dorothea Vom Bruch
I remember we already had a discussion as to whether individual GPU throughput should also be checked. Was there a reason not to check for that in addition to the average? I think this could be useful, as sometimes a decrease is more pronounced in some architectures / GPU types.
added 2 commits
added 1 commit
- d73e4019 - make threshold less sensitive at -7.5%. Catch throughput alarm properly
- Resolved by Rosen Matev
Thanks for adding this! The warning in the mattermost channel is good. But what exactly do the percentages mean? "Device averaged speed-up (% change): 0.99 (0.78%)". I.e. what is the 0.99 and what the 0.78?
Could we also add a warning on the MR itself? This will make it easier for the shifter and maintainer to spot the decrease. It think in Moore an automatic label is added to the MR if the throughput decreases. Maybe we can do something similar?
assigned to @lpica
added 2 commits