Add Step pipeline creation
In the current state after announce/harvest is done, every step has to be manually started. This change would allow the creation of step chains (aka pipelines). If any of the steps fail for a specific archive then the pipeline stops for that archive.
Use-cases to consider:
- This change would allow users to create a pipeline on the UI where they can pick and choose which steps to execute in what order.
- Creating automatic pipelines on the backend (e.g periodic task running weekly for CDS-RDM).
Example on how to schedule it in settings.py
CELERY_BEAT_SCHEDULE = {
'preserve_cds_sandbox': {
'task': 'archive_records_periodically',
'schedule': crontab(hour=22, minute=0, day_of_week='sunday'),
'args': ("cds-rdm-sandbox",), # parameters for the task
'options': {
'expires': 15.0,
},
},
}
Things to consider: how to calculate the possible next steps. Currently, if SIP is done then AM, Registry, Tape is the option, then after executing one of them it's the same options again. Also Extract title is only there for specific archives.
The same steps are allowed multiple times, in our pipeline it should look like this: Harvest, Preserve on AM, Push to Registry(SIP,AIP), Notify Source, Push to Tape, Push to Registry (with tape info)
Initial design idea for the UI: on the Archives page after selecting entries, on the action menu there would be a new option saying for example "Create pipeline". By clicking on it, there will be a modal where the user can select the steps. (Maybe we could reuse the current pipeline component from the Archive detail page)
Suggestions:
-
Add a new field to the Archive called
pipeline
which has the selected steps in an array and handle it as a FIFO queue. -
If only 1 pipeline can run at the time and once a pipeline ran we don't need to know that the steps were part of a pipeline then when the request comes in for an archive with a pipeline like [5, 7] (numbers representing the
Steps
enum) then the steps are created with status set toNOT_RUN
and pushed into an array:archive.pipeline == [Step34, Step35]
. Then the archive has a method that checks if there is a step in the pipeline (pop from the beginning of the array) and executes it. This popping should be also part of thefinalize
method.
If a Step fails it can be rerun on the UI, and if it's successful then the pipeline continues. The NOT_RUN
steps in the meantime will be visible on the UI.
For periodic tasks there could be a different pipeline defined for the different tasks and it is added to every Archive that's created so it fits into this design.
To comply with this: oais-web#119 (closed) steps can be added while the last one is still in progress, but then when creating the pipeline make sure only pop the first one if the last step is finished, if it's not then the finalize
method will execute the pipeline.
Open questions:
- Max limit for the steps?
- If pipeline history needs to be stored then a separate Pipeline table has to be created.