Implement tool for job management
Related to #2 , but more general. We should have a tool, presumably a class, called JobsManager
, which takes care of splitting files into jobs and submission/monitoring of these jobs.
A JobsManager
would take as inputs:
job_manager = JobsManager(
batch = "local" # or "condor"/"dask"/etc.
n_events_per_job = 10**6, # split files such that we have ~10**6 events per job
#n_files_per_job = 10, # alternatively to n_events_per_job, might just want to specify number of input files per job
target = <function>
)
and we could add jobs to the manager with:
job_manager.add_jobs(
files = [f1.root, f2.root, ...],
target = <function>,
args = {} # in case there are extra args for function
)
where target
is some function that runs the whole analysis. It takes a list of files as an input, runs the SystematicsProducer
and TagSequence
on these files, and then presumably writes these to an output format. This could be done through an Analysis
class, which owns a TagSequence
, SystematicsProducer
(which may vary by sample), etc.
I think it makes the most sense to have one call of add_jobs
for each sample, as the exact details of running will be different in principle for different samples, and the target
and/or args
can be modified for each. The JobsManager
should not need to know any details of the physics analysis being done in its jobs, we would simply pass a different function (or the same function with different arguments) for each of the jobs.
I'd suggest we start with implementing local submission, and then add tools for running on HPC clusters as described in #2 .