CI refactoring (!547) · Merge requests · LHCb / Allen

Ryunosuke O'Neil requested to merge roneil/ci-matrix-refactoring into master Mar 18, 2021

Jobs are defined using the parallel matrix keyword.
Scripts which run the tests are to be placed in scripts/ci/jobs/TEST_NAME.sh
CI configuration is split across YAML files
- main.yaml included into .gitlab-ci.yml - entry point for the workflow
  - should be clear what is running, and on what.
- common*.yaml - do not actually contain any jobs, just re-usable keys
  - common-build.yaml - parallel:matrix: configurations for builds
  - common-run.yaml - parallel:matrix: configurations for run jobs, to be run on each device
  - common.yaml - keys reusable across all jobs
- devices.yaml - contains keys to set the correct tags: and variables:
there is also a README.md at scripts/ci/config/README.md to give a brief idea of how to add tests + devices
extends: is used now - YAML anchors have been removed since they are not usable if a file is included
build.sh restarts the build from the last target if it failed due to an OOM error
- less job failures & avoids rerunning builds from the beginning
- waits between retries - wait time is randomly picked and scales with number of tries
- if the job times out after 1h30m the job is failed + retried as before
- example: https://gitlab.cern.ch/lhcb/Allen/-/jobs/13019643#L459
Catch2 executable is called directly in run_built_tests
- Calling the executable allows a junit XML report to be generated and passed to the GitLab CI unit test report feature.
- The unit tests are run on each device with RelWithDebInfo + Debug clang10 builds.
- if more Catch2 executable targets become available then they should also be added to the run_built_tests.sh script
post_telegraf.py is commented out
allowed to fail:
- "full test" physics-efficiency - scifi_v6 efficiency comparisons sometimes don't match #232 (closed)
- "full test" run-changes - flaky
- "test" run-changes - flaky
I tried to experiment with 'metrics', however these only work with the premium tiers and above (therefore can be ignored)

At the moment there is a 'minimal' pipeline (merge requests and master, web, schedules) and 'full' pipeline (master, web, schedules)

Todo:

"full" pipeline in MR can be triggered manually, auto-runs for master, web, schedules
build (partially complete)
throughput
run changes with / w/o
efficiencies
publishing
(correct me if something is missing) parallel matrix: configurations match all combinations from old configuration
address discussions in !553 (closed)

Edited Apr 13, 2021 by Ryunosuke O'Neil

CI refactoring

Merge request reports