Use CTest fixtures and --repeat

@clemenci I'm curious what you think regarding handling timeouts.

Some ctest examples when using fixtures.

All tests pass

$ ctest -T test -R root_io.coll.read -j 10
   Site: lbquantaperf02
   Build name: Linux-g++
Create new tag: 20210313-1954 - Experimental
Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt
    Start 253: GaudiExamples.root_io.write
1/4 Test #253: GaudiExamples.root_io.write ........   Passed   19.34 sec
    Start 246: GaudiExamples.root_io.coll.write
    Start 252: GaudiExamples.root_io.read
2/4 Test #246: GaudiExamples.root_io.coll.write ...   Passed    4.58 sec
3/4 Test #252: GaudiExamples.root_io.read .........   Passed    4.73 sec
    Start 245: GaudiExamples.root_io.coll.read
4/4 Test #245: GaudiExamples.root_io.coll.read ....   Passed    4.73 sec

100% tests passed, 0 tests failed out of 4

Label Time Summary:
Gaudi            =  33.38 sec*proc (4 tests)
GaudiExamples    =  33.38 sec*proc (4 tests)
QMTest           =  33.38 sec*proc (4 tests)

Total Test time (real) =  28.94 sec

Ignore the fixtures with `-FA .*` (prerequisites already ran before so test passes)

$ ctest -T test -R root_io.coll.read -j 10 -FA .*
   Site: lbquantaperf02
   Build name: Linux-g++
Create new tag: 20210313-1956 - Experimental
Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt
    Start 245: GaudiExamples.root_io.coll.read
1/1 Test #245: GaudiExamples.root_io.coll.read ...   Passed   16.97 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
Gaudi            =  16.97 sec*proc (1 test)
GaudiExamples    =  16.97 sec*proc (1 test)
QMTest           =  16.97 sec*proc (1 test)

Total Test time (real) =  17.00 sec

Ignore the fixtures but run in a clean directory (test fails)

ctest -T test -R root_io.coll.read -j 10 -FA .*
   Site: lbquantaperf02
   Build name: Linux-g++
Create new tag: 20210313-1958 - Experimental
Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt
    Start 245: GaudiExamples.root_io.coll.read
1/1 Test #245: GaudiExamples.root_io.coll.read ...***Failed   16.19 sec

0% tests passed, 1 tests failed out of 1

Label Time Summary:
Gaudi            =  16.19 sec*proc (1 test)
GaudiExamples    =  16.19 sec*proc (1 test)
QMTest           =  16.19 sec*proc (1 test)

Total Test time (real) =  16.22 sec

The following tests FAILED:
        245 - GaudiExamples.root_io.coll.read (Failed)
Errors while running CTest

Dependency (fixture) fails, requested test is not run

ctest -T test -R root_io.coll.read -j 10
   Site: lbquantaperf02
   Build name: Linux-g++
Create new tag: 20210313-2001 - Experimental
Test project /home/rmatev/stackx/Gaudi/build.x86_64-centos7-gcc9-opt
    Start 253: GaudiExamples.root_io.write
1/4 Test #253: GaudiExamples.root_io.write ........   Passed   18.50 sec
    Start 246: GaudiExamples.root_io.coll.write
    Start 252: GaudiExamples.root_io.read
2/4 Test #252: GaudiExamples.root_io.read .........***Failed    2.97 sec
3/4 Test #246: GaudiExamples.root_io.coll.write ...   Passed    4.25 sec
    Start 245: GaudiExamples.root_io.coll.read
Failed test dependencies: GaudiExamples.root_io.read
4/4 Test #245: GaudiExamples.root_io.coll.read ....***Not Run   0.00 sec

50% tests passed, 2 tests failed out of 4

Label Time Summary:
Gaudi            =  25.72 sec*proc (4 tests)
GaudiExamples    =  25.72 sec*proc (4 tests)
QMTest           =  25.72 sec*proc (4 tests)

Total Test time (real) =  22.82 sec

The following tests FAILED:
        245 - GaudiExamples.root_io.coll.read (Not Run)
        252 - GaudiExamples.root_io.read (Failed)
Errors while running CTest

added 2 commits

88633a6b - Looser conversion from QMTest to CTest test name
d79408b7 - Fail when a QMTest prerequisite test does not exist

Compare with previous version

changed the description

When extract_qmtest_metadata.py exits with a non-zero return code, we only get a CMake warning. That's inconsistent with the error handling inside the script (exits immediately when a problem is encountered).

I think we should either have an error from CMake (easier) or have the script continue on problems.

Let me first comment about the fixtures business.

When using the actual QMTest scripts, the prerequisites had to pass for the dependent tests to be run. This behaviour was cause of debates as for somebody it was better to let the following tests to fail on their own. IMHO this view was due to the fact that some of the fixture setup tests we have are not checking if the setup worked, but they only check that they managed to print the same messages as in a reference file.

I prefer the fixture based approach, but only once the tests have been reviewed and the fixture are made more reliable. In general I believe it's better if we convert the pseudo-qmtest tests into proper CTest tests, with the freedom to choose if we want just to control the order of execution or have proper fixture setup tests. This would also allow use to use CTest to control test timeouts and allow for --repeat after-timeout:3.

About timeouts, I do not agree that we should repeat only timeouts. Transient failures could be due to infrastructure problems or non (strictly) reproducible output. Infrastructure problems could present themselves as timeouts or normal errors. Non reproducible output issues should not exist, but they are kind of unavoidable if we keep looking at the standard output of the jobs. Sometimes we have proper random deadlocks, and I would prefer not to mask them as they are due to real bugs.

My opinion is that we should never use --repeat, but there are cases where we cannot do otherwise (at least until we fix the underlying problem), and for those I would prefer we control which tests are allowed to repeat in case of failure.

Now one more consideration about using CTest to check for timeouts. Our test wrapper is not perfect, but I like a lot the stacktrace produced in case of timeouts, and there are other useful features. CTest is great for unit tests, but our wrapper is tuned for our integration tests, and I'm not sure we are ready to give up some of the features we have.

added 27 commits

d79408b7...e4424a93 - 21 commits from branch gaudi:master
199bd2da - Use FIXTURES_REQUIRED for QMTest test dependencies
ca75999b - Fix edge case in QMTest discovery
73a7b03e - CI: Do not use deprecated globally-defined variables
3552d3a7 - CI: Use ctest --repeat instead of retrying jobs
685823d3 - Looser conversion from QMTest to CTest test name
6e827b1e - Fail when a QMTest prerequisite test does not exist

Compare with previous version

mentioned in issue #177

added 22 commits

6e827b1e...a4ae2988 - 16 commits from branch gaudi:master
08abc192 - Use FIXTURES_REQUIRED for QMTest test dependencies
fd4191ad - Fix edge case in QMTest discovery
31ca0b15 - CI: Do not use deprecated globally-defined variables
305d12de - CI: Use ctest --repeat instead of retrying jobs
b1dd6122 - Looser conversion from QMTest to CTest test name
f25f7143 - Fail when a QMTest prerequisite test does not exist

Compare with previous version

added 10 commits

f25f7143...903647ab - 4 commits from branch gaudi:master
3aa96c87 - Use FIXTURES_REQUIRED for QMTest test dependencies
4ed3f7f2 - Fix edge case in QMTest discovery
bd1aa174 - CI: Do not use deprecated globally-defined variables
2bcf0e4f - CI: Use ctest --repeat instead of retrying jobs
ef74720b - Looser conversion from QMTest to CTest test name
c6476f31 - Fail when a QMTest prerequisite test does not exist

Compare with previous version

I'd like to move forward with the fixtures change. We constantly have to explain that test B failed because it depends on test A that timed out (as the relationship between the tests is not obvious from the dashboard). Even when one happens to know about the test dependencies it takes time to interpret test runs with failures.

/ci-test --merge

[2021-06-07 16:23] Validation started with lhcb-master-mr#2432
[2021-06-30 00:04] Validation started with lhcb-gaudi-head#2973
[2021-06-30 00:55] Validation started with lhcb-run2-gaudi-head#158
[2021-06-30 01:35] Validation started with lhcb-gaudi-head#2973
[2021-07-01 00:04] Validation started with lhcb-gaudi-head#2974
[2021-07-01 00:49] Validation started with lhcb-run2-gaudi-head#159
[2021-07-01 01:20] Validation started with lhcb-gaudi-head#2974
[2021-07-02 00:04] Validation started with lhcb-gaudi-head#2975
[2021-07-02 00:50] Validation started with lhcb-run2-gaudi-head#160
[2021-07-02 01:18] Validation started with lhcb-gaudi-head#2975
[2021-07-03 00:03] Validation started with lhcb-gaudi-head#2976
[2021-07-03 00:55] Validation started with lhcb-run2-gaudi-head#161
[2021-07-05 00:03] Validation started with lhcb-gaudi-head#2977
[2021-07-05 10:00] Validation started with lhcb-gaudi-head#2978
[2021-07-06 00:03] Validation started with lhcb-gaudi-head#2979
[2021-07-06 00:50] Validation started with lhcb-run2-gaudi-head#162
[2021-07-07 00:04] Validation started with lhcb-gaudi-head#2980
[2021-07-07 00:52] Validation started with lhcb-run2-gaudi-head#163
[2021-07-08 00:04] Validation started with lhcb-gaudi-head#2981
[2021-07-08 00:59] Validation started with lhcb-run2-gaudi-head#164
[2021-07-09 00:04] Validation started with lhcb-gaudi-head#2982
[2021-07-10 00:04] Validation started with lhcb-gaudi-head#2983
[2021-07-11 00:04] Validation started with lhcb-gaudi-head#2984
[2021-07-13 00:06] Validation started with lhcb-gaudi-head#2985
[2021-07-13 00:48] Validation started with lhcb-run2-gaudi-head#165
[2021-07-13 00:59] Validation started with lhcb-run2-gaudi-head#165
[2021-07-14 00:06] Validation started with lhcb-gaudi-head#2986
[2021-07-14 00:54] Validation started with lhcb-run2-gaudi-head#166
[2021-07-15 00:06] Validation started with lhcb-gaudi-head#2987
[2021-07-15 01:28] Validation started with lhcb-run2-gaudi-head#167

assigned to @clemenci

changed milestone to %v36r1

added lhcb-gaudi-head label

mentioned in issue #199

changed the description

added 100 commits

c6476f31...94b59129 - 93 commits from branch gaudi:master
b0ab3baf - Use FIXTURES_REQUIRED for QMTest test dependencies
f1ced6f0 - Fix edge case in QMTest discovery
3a460c0e - CI: Do not use deprecated globally-defined variables
73e1a36e - CI: Use ctest --repeat instead of retrying jobs
d4f126b1 - Looser conversion from QMTest to CTest test name
51a1ddc9 - Fail when a QMTest prerequisite test does not exist
ac5e0863 - Fail CMake configuration when extract_qmtest_metadata.py fails

Compare with previous version

resolved all threads

approved this merge request

merged

mentioned in commit 348b595e

mentioned in issue #120 (closed)

mentioned in issue #216

mentioned in merge request !1414 (closed)

Use CTest fixtures and --repeat

Use FIXTURES_REQUIRED for QMTest test dependencies

CI: Use ctest --repeat instead of retrying jobs

Miscellaneous

Activity

Use CTest fixtures and --repeat

Use FIXTURES_REQUIRED for QMTest test dependencies

CI: Use ctest --repeat instead of retrying jobs

Miscellaneous

Merge request reports

Activity