Use CTest fixtures and --repeat
Use FIXTURES_REQUIRED for QMTest test dependencies
For a nice introduction to FIXTURES_*
in CTest see
here.
FIXTURES_REQUIRED
is set in addition to setting DEPENDS
for ever test that
defines prerequisites
. Using FIXTURES_SETUP
a fixture is defined for every
test that is a dependency of another, where the fixture name is identical to
the test name.
The main advantage of using fixtures is usability:
ctest -R <regex>
will also run the dependencies, unless explicitly ignored (e.g. with-FA .*
).- Moreover, when a dependency (fixture) fails, the dependent test is skipped ("Not Run").
CI: Use ctest --repeat instead of retrying jobs
With ctest --repeat until-pass:3
, tests are run until successful or
at most 3 times. Tests are retried on any failure (timeout or other).
NB: ideally we should only retry after timeouts with ctest --repeat after-timeout:3
.
The problem is that timeouts are handled by our test wrapper and from the CTest point of view
they look like ordinary failures. There is a workaround we can do to achieve capturing stack
traces and get the Timeout
status but the caveat is that execution time will be reported as 0.
Cleaner solutions (and the workaround) are discussed at https://gitlab.kitware.com/cmake/cmake/-/issues/17288.
Miscellaneous
- Fix edge case in QMTest discovery (
gaudi.qms/qmt_args.qmt
was wrongly contracted togaudi_args
) - CI: Do not use deprecated globally-defined variables
- Looser conversion from QMTest to CTest test name
- Fail when a QMTest prerequisite test does not exist
Merge request reports
Activity
@clemenci I'm curious what you think regarding handling timeouts.
Some
ctest
examples when using fixtures.All tests pass
$ ctest -T test -R root_io.coll.read -j 10 Site: lbquantaperf02 Build name: Linux-g++ Create new tag: 20210313-1954 - Experimental Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt Start 253: GaudiExamples.root_io.write 1/4 Test #253: GaudiExamples.root_io.write ........ Passed 19.34 sec Start 246: GaudiExamples.root_io.coll.write Start 252: GaudiExamples.root_io.read 2/4 Test #246: GaudiExamples.root_io.coll.write ... Passed 4.58 sec 3/4 Test #252: GaudiExamples.root_io.read ......... Passed 4.73 sec Start 245: GaudiExamples.root_io.coll.read 4/4 Test #245: GaudiExamples.root_io.coll.read .... Passed 4.73 sec 100% tests passed, 0 tests failed out of 4 Label Time Summary: Gaudi = 33.38 sec*proc (4 tests) GaudiExamples = 33.38 sec*proc (4 tests) QMTest = 33.38 sec*proc (4 tests) Total Test time (real) = 28.94 sec
Ignore the fixtures with `-FA .*` (prerequisites already ran before so test passes)
$ ctest -T test -R root_io.coll.read -j 10 -FA .* Site: lbquantaperf02 Build name: Linux-g++ Create new tag: 20210313-1956 - Experimental Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt Start 245: GaudiExamples.root_io.coll.read 1/1 Test #245: GaudiExamples.root_io.coll.read ... Passed 16.97 sec 100% tests passed, 0 tests failed out of 1 Label Time Summary: Gaudi = 16.97 sec*proc (1 test) GaudiExamples = 16.97 sec*proc (1 test) QMTest = 16.97 sec*proc (1 test) Total Test time (real) = 17.00 sec
Ignore the fixtures but run in a clean directory (test fails)
ctest -T test -R root_io.coll.read -j 10 -FA .* Site: lbquantaperf02 Build name: Linux-g++ Create new tag: 20210313-1958 - Experimental Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt Start 245: GaudiExamples.root_io.coll.read 1/1 Test #245: GaudiExamples.root_io.coll.read ...***Failed 16.19 sec 0% tests passed, 1 tests failed out of 1 Label Time Summary: Gaudi = 16.19 sec*proc (1 test) GaudiExamples = 16.19 sec*proc (1 test) QMTest = 16.19 sec*proc (1 test) Total Test time (real) = 16.22 sec The following tests FAILED: 245 - GaudiExamples.root_io.coll.read (Failed) Errors while running CTest
Dependency (fixture) fails, requested test is not run
ctest -T test -R root_io.coll.read -j 10 Site: lbquantaperf02 Build name: Linux-g++ Create new tag: 20210313-2001 - Experimental Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt Start 253: GaudiExamples.root_io.write 1/4 Test #253: GaudiExamples.root_io.write ........ Passed 18.50 sec Start 246: GaudiExamples.root_io.coll.write Start 252: GaudiExamples.root_io.read 2/4 Test #252: GaudiExamples.root_io.read .........***Failed 2.97 sec 3/4 Test #246: GaudiExamples.root_io.coll.write ... Passed 4.25 sec Start 245: GaudiExamples.root_io.coll.read Failed test dependencies: GaudiExamples.root_io.read 4/4 Test #245: GaudiExamples.root_io.coll.read ....***Not Run 0.00 sec 50% tests passed, 2 tests failed out of 4 Label Time Summary: Gaudi = 25.72 sec*proc (4 tests) GaudiExamples = 25.72 sec*proc (4 tests) QMTest = 25.72 sec*proc (4 tests) Total Test time (real) = 22.82 sec The following tests FAILED: 245 - GaudiExamples.root_io.coll.read (Not Run) 252 - GaudiExamples.root_io.read (Failed) Errors while running CTest
- Resolved by Marco Clemencic
When
extract_qmtest_metadata.py
exits with a non-zero return code, we only get a CMake warning. That's inconsistent with the error handling inside the script (exits immediately when a problem is encountered).I think we should either have an error from CMake (easier) or have the script continue on problems.
- Resolved by Marco Clemencic
- Resolved by Marco Clemencic
- Resolved by Marco Clemencic
Let me first comment about the fixtures business.
When using the actual QMTest scripts, the prerequisites had to pass for the dependent tests to be run. This behaviour was cause of debates as for somebody it was better to let the following tests to fail on their own. IMHO this view was due to the fact that some of the fixture setup tests we have are not checking if the setup worked, but they only check that they managed to print the same messages as in a reference file.
I prefer the fixture based approach, but only once the tests have been reviewed and the fixture are made more reliable. In general I believe it's better if we convert the pseudo-qmtest tests into proper CTest tests, with the freedom to choose if we want just to control the order of execution or have proper fixture setup tests. This would also allow use to use CTest to control test timeouts and allow for
--repeat after-timeout:3
.About timeouts, I do not agree that we should repeat only timeouts. Transient failures could be due to infrastructure problems or non (strictly) reproducible output. Infrastructure problems could present themselves as timeouts or normal errors. Non reproducible output issues should not exist, but they are kind of unavoidable if we keep looking at the standard output of the jobs. Sometimes we have proper random deadlocks, and I would prefer not to mask them as they are due to real bugs.
My opinion is that we should never use
--repeat
, but there are cases where we cannot do otherwise (at least until we fix the underlying problem), and for those I would prefer we control which tests are allowed to repeat in case of failure.Now one more consideration about using CTest to check for timeouts. Our test wrapper is not perfect, but I like a lot the stacktrace produced in case of timeouts, and there are other useful features. CTest is great for unit tests, but our wrapper is tuned for our integration tests, and I'm not sure we are ready to give up some of the features we have.
added 27 commits
-
d79408b7...e4424a93 - 21 commits from branch
gaudi:master
- 199bd2da - Use FIXTURES_REQUIRED for QMTest test dependencies
- ca75999b - Fix edge case in QMTest discovery
- 73a7b03e - CI: Do not use deprecated globally-defined variables
- 3552d3a7 - CI: Use ctest --repeat instead of retrying jobs
- 685823d3 - Looser conversion from QMTest to CTest test name
- 6e827b1e - Fail when a QMTest prerequisite test does not exist
Toggle commit list-
d79408b7...e4424a93 - 21 commits from branch
mentioned in issue #177
added 22 commits
-
6e827b1e...a4ae2988 - 16 commits from branch
gaudi:master
- 08abc192 - Use FIXTURES_REQUIRED for QMTest test dependencies
- fd4191ad - Fix edge case in QMTest discovery
- 31ca0b15 - CI: Do not use deprecated globally-defined variables
- 305d12de - CI: Use ctest --repeat instead of retrying jobs
- b1dd6122 - Looser conversion from QMTest to CTest test name
- f25f7143 - Fail when a QMTest prerequisite test does not exist
Toggle commit list-
6e827b1e...a4ae2988 - 16 commits from branch
added 10 commits
-
f25f7143...903647ab - 4 commits from branch
gaudi:master
- 3aa96c87 - Use FIXTURES_REQUIRED for QMTest test dependencies
- 4ed3f7f2 - Fix edge case in QMTest discovery
- bd1aa174 - CI: Do not use deprecated globally-defined variables
- 2bcf0e4f - CI: Use ctest --repeat instead of retrying jobs
- ef74720b - Looser conversion from QMTest to CTest test name
- c6476f31 - Fail when a QMTest prerequisite test does not exist
Toggle commit list-
f25f7143...903647ab - 4 commits from branch
- Resolved by Marco Clemencic
I'd like to move forward with the fixtures change. We constantly have to explain that test B failed because it depends on test A that timed out (as the relationship between the tests is not obvious from the dashboard). Even when one happens to know about the test dependencies it takes time to interpret test runs with failures.
/ci-test --merge
- [2021-06-07 16:23] Validation started with lhcb-master-mr#2432
- [2021-06-30 00:04] Validation started with lhcb-gaudi-head#2973
- [2021-06-30 00:55] Validation started with lhcb-run2-gaudi-head#158
- [2021-06-30 01:35] Validation started with lhcb-gaudi-head#2973
- [2021-07-01 00:04] Validation started with lhcb-gaudi-head#2974
- [2021-07-01 00:49] Validation started with lhcb-run2-gaudi-head#159
- [2021-07-01 01:20] Validation started with lhcb-gaudi-head#2974
- [2021-07-02 00:04] Validation started with lhcb-gaudi-head#2975
- [2021-07-02 00:50] Validation started with lhcb-run2-gaudi-head#160
- [2021-07-02 01:18] Validation started with lhcb-gaudi-head#2975
- [2021-07-03 00:03] Validation started with lhcb-gaudi-head#2976
- [2021-07-03 00:55] Validation started with lhcb-run2-gaudi-head#161
- [2021-07-05 00:03] Validation started with lhcb-gaudi-head#2977
- [2021-07-05 10:00] Validation started with lhcb-gaudi-head#2978
- [2021-07-06 00:03] Validation started with lhcb-gaudi-head#2979
- [2021-07-06 00:50] Validation started with lhcb-run2-gaudi-head#162
- [2021-07-07 00:04] Validation started with lhcb-gaudi-head#2980
- [2021-07-07 00:52] Validation started with lhcb-run2-gaudi-head#163
- [2021-07-08 00:04] Validation started with lhcb-gaudi-head#2981
- [2021-07-08 00:59] Validation started with lhcb-run2-gaudi-head#164
- [2021-07-09 00:04] Validation started with lhcb-gaudi-head#2982
- [2021-07-10 00:04] Validation started with lhcb-gaudi-head#2983
- [2021-07-11 00:04] Validation started with lhcb-gaudi-head#2984
- [2021-07-13 00:06] Validation started with lhcb-gaudi-head#2985
- [2021-07-13 00:48] Validation started with lhcb-run2-gaudi-head#165
- [2021-07-13 00:59] Validation started with lhcb-run2-gaudi-head#165
- [2021-07-14 00:06] Validation started with lhcb-gaudi-head#2986
- [2021-07-14 00:54] Validation started with lhcb-run2-gaudi-head#166
- [2021-07-15 00:06] Validation started with lhcb-gaudi-head#2987
- [2021-07-15 01:28] Validation started with lhcb-run2-gaudi-head#167
Edited by Software for LHCbassigned to @clemenci
changed milestone to %v36r1
added lhcb-gaudi-head label
mentioned in issue #199
added 100 commits
-
c6476f31...94b59129 - 93 commits from branch
gaudi:master
- b0ab3baf - Use FIXTURES_REQUIRED for QMTest test dependencies
- f1ced6f0 - Fix edge case in QMTest discovery
- 3a460c0e - CI: Do not use deprecated globally-defined variables
- 73e1a36e - CI: Use ctest --repeat instead of retrying jobs
- d4f126b1 - Looser conversion from QMTest to CTest test name
- 51a1ddc9 - Fail when a QMTest prerequisite test does not exist
- ac5e0863 - Fail CMake configuration when extract_qmtest_metadata.py fails
Toggle commit list-
c6476f31...94b59129 - 93 commits from branch
mentioned in commit 348b595e
mentioned in issue #120 (closed)
mentioned in issue #216
mentioned in merge request !1414 (closed)