Improve CI's instrumentation
Problem to solve
The current setup for the CI is limited to running the full pipeline every time including static analysis, compilation and generation of a new image even when you are working only on system tests that do not require to recompile the CTA code nor generate a new container image with its corresponding upload to the registry. Recently I added some extremely basic functionality (see #659 (closed)) to be able to able to run only the system tests from the latest image built for main. This implementation has its drawbacks, you cannot choose which image you are using, so during working hours you will get the one generated with Alma9 compilation as it is the last one created from the nightly scheduled pipelines.
I would like to keep moving in this direction and extend it into a more flexible and configurable CI for current and future needs. For example, during the investigation of the failing eviction during system tests (see #662 (closed)) the basic implementation was used to run the tests every hour over the weekend to see if adding extra delay would fix the problem. I manually tested 20 times and every tests was successful, BUT, after automated testing over the weekend the results were 10 failures out of 103 executions.
Having more flexibility in CI eases investigating issues without wasting resources unnecessarily, i.e., consuming shared runners resources or running tests that are not actually needed.
Stakeholders
- CTA Dev Team
- External collaborators
Proposal
New CI requirements:
-
Run system tests using already generated image:
- From a fixed source, this is the exact image tag.
- Fuzzy source:
- Source options: main; branch that is triggering the pipeline; specific branch
- Type of image to search for: any image; Specific flags for compilation, i.e., scheduler type or oracle support (everything else will go away xrootd4/5, cc7/alma9).
-
Be able to run only a subset of the tests
- By specifying list of tests