Improve GitLab CI pipelines
As it stands, the CI file is growing to be very difficult to manage. The issue in my view is that it is not possible to define a group of tasks for a certain number of architectures. Here are some use cases that scale badly as a consequence of this:
-
For instance, if one wants to add a new throughput test for the SciFi, conceptually this should be not very hard, but in practice it implies: creating build jobs for every architecture, adding throughput jobs for every architecture, and adding a publish job for that architecture as well. This amounts to 10 jobs that have to be added by hand.
-
If one wants to add a new architecture to execute on, one needs to go through every single "group of tasks" and add jobs for the architecture there and add the jobs to each of the requirements in the publish jobs.
-
Adding jobs to check the efficiency also doesn't scale. Since every validation now requires its own sequence, it is necessary to add build jobs for every architecture that needs to be validated, runs and efficiency checkers.
-
The recent issue of seemingly having a maximum number of characters in required job names makes it very hard to go over all jobs and change the naming scheme and putting the information on variables by hand.
-
Due to the growing number of jobs it is increasingly harder to keep track of what is being tested in which architecture.
With the exception of the last one, the others are tasks that occur with a relative frequency. It is true that we would be adding a layer of complexity by moving to code generation, but in my mind this code generation file should only be used for adding throughput tests, efficiency tests and build jobs (these "group tasks"), and nothing else. A "base" CI file could still exist, written in the same CI syntax. We should transition to creating tests for other use-cases instead of writing the tests in bash, removing tests from the gitlab-ci file into normal tests (eg. the run changes test).
A proposed set of items to do:
-
Create a base CI file, maintaining the same syntax, that can be used to add jobs that already scaled well. Add a script that generates the.gitlab-ci.yml
file from the base CI.parallel:matrix:
jobs used throughout -
Move sections of the base CI to code-generation in python. Namely, throughput and efficiency checking.most jobs moved torun
stage as part of matrix builds -
Move functionality from the job's name to variables, make names of jobs shorter. -
Attempt to combine the quick and the full pipelines into a single one by using when:manual
. -
Use both branch and MR pipelines without duplication. -
Run the docker_build
job only when needed by usingonly:changes
. -
Signal significant changes in throughput w.r.t. master (fail throughput jobs where there is a significant decrease in throughput; do nothing when there is an increase?)Tracked in #236 (closed) -
Remove or fix post_telegraf.py (see https://gitlab.cern.ch/lhcb/Allen/-/jobs/12601185) -
Do not use allow_failure
unless absolutely necessary (e.g. known flaky tests that can't be fixed easily).