Analysis Productions CI improvements
The current method of impersonating a GitLab runner is increasing fragile and was probably a bad idea in hindsight (though perhaps necessary given the GitLab features at the time). A better way to do this now would be to use the official executor with a custom executor. That said this does have some down sides:
- Running jobs is still tightly coupled to GitLab so we can’t easily restart individual tests from LbAPWeb.
- The executor is a long running process which needs to be hosted somewhere which is less openshift-friendly than the current message queue based design.
- It would give us a limit on the number of concurrently running tests.
- The current issues of jobs getting stuck in running would still exist (though perhaps this could be designed around).
A better idea might be to use the commit status API to impersonate an external CI provider. This would still allow the status to be reported in GitLab however clicking on it would take people to LbAPWeb rather than having the step where you see a (mostly useless) log in GitLab.
This would be a big change however we can implement this in parallel to the exisiting system and just run twice as many tests for each production. Then once it’s ready we can start reporting the status in GitLab. After a little bit of time running both systems we can switch over to using only the new one and disable the current pipeline.
A change that I think should be included alongside this would be to change the schema inside the database to be one which is more amenable to running efficient queries. This currently isn’t easy as the model is overly complex and requires a lot of joins to report the status summary of single pipeline.
-
Remove legacy functionality from LbAnalysisProductions (especially flask-related stuff from the old website) -
Remove any remaining uses of dirac_prod
or direct uses of DIRAC APIs -
Identify if there is any missing functionality that needs to be ported to LbAPI
-
Test that webhooks and the commit status API is able to do what we need -
Design the new schema for the database -
Make a new “start job” task in which makes better use of celery’s ability to spawn new tasks (i.e. one task per test rather than running them all in a single one) -
Add a webhook to LbAPI which will trigger a new “start job” task in celery