Draft: Refactoring Code Base and adding Features
This merge request aims to fix several issues, make the code more intuitive, and add new features.
Features:
- adding new datasets and samples with restart of the analysis
- [Done]the Task
DatasetsStatuscompares thedatasets_statuscontents with thedatasets.yamlconfigured by the user
- [Done]the Task
- deletion of ntuples, which are not used due to changes in config. Then they will be submitted again with updated config
- [Done]samples/datasets will be marked for deletion and will be deleted in a following task
- Added wrapper task
SkimDatasets(renaming soon)- [Done]This task will run the data preparation (NTuple) tasks of the analysis
Fixed Issues:
- the task
CheckSamplewill not be called seemingly infinite times- [DONE]solved trough refactoring
CheckSample
- [DONE]solved trough refactoring
- safeguard for
datasets_statusfile- [WIP]Task
DatasetsStatusis now an external task, therefore law will not delete the output
- [WIP]Task
- Fix job status information not being accessible for long runs (longer than 2 weeks) breaks the analysis
- [WIP]Fixed by checking the
datasets_statusfile if sample is alreadydonebefore attempting to search for status in the ´RetrieveJobStatus` log where it will not be accessible anymore after 2 weeks. Issues with edge cases for jobs with long processing time still remains.
- [WIP]Fixed by checking the
- Race conditions in checkloop. The UpdateDatasetsStatus task rewrites the datasets_status file, while at the same time the sample updates also try to merge into to the datasets_file. Previously the problem was dodged by generating temporary files and then renaming them, but it should not have been necessary and the workflow could be linearized so no such problems can arise.
- [DONE]solved through refactoring
CheckSampleand linearizing the tasks instead of parallelisingUpdateDatasetsStatusandCheckSample
- [DONE]solved through refactoring
- Issue with tasks counted as complete, but actually in failed state
- [WIP]solved through refactoring
Convenience changes:
- [WIP]terminal printouts are reduced and manually configured with individual logging levels per task.
- [Done]analysis parameter are not required to be specified for task execution in cli - all tasks can now access the config and read the parameters from it
- [DONE]The Task
CheckSampleis now split up into alooptask,PeriodicSampleUpdating, and a chain of tasks called by thelooptask;RetrieveJobStatus,UpdateSampleStatusAndProgress,RetryJob,DownloadSampleUpdateDatasetsStatusFile. This should make the code more readable and resolve the weirdly interacting parallel running loops forCheckSampleandUpdateDatasetsStatusFile.
Renaming:
-
checkloop->data_preparation -
processloop->processing_data -
CreateStatusFile->DatasetsStatus
Edited by Malte Hoja