Project 'HiggsDNA-project/HiggsDNA' was moved to 'cms-analysis/general/HiggsDNA'. Please update any links and bookmarks that may still have the old path.

Implement tool for job management

Related to #2 , but more general. We should have a tool, presumably a class, called JobsManager, which takes care of splitting files into jobs and submission/monitoring of these jobs.

A JobsManager would take as inputs:

job_manager = JobsManager(
    batch = "local" # or "condor"/"dask"/etc.
    n_events_per_job = 10**6, # split files such that we have ~10**6 events per job
    #n_files_per_job = 10, # alternatively to n_events_per_job, might just want to specify number of input files per job
    target = <function>
)

and we could add jobs to the manager with:

job_manager.add_jobs(
    files = [f1.root, f2.root, ...],
    target = <function>,
    args = {} # in case there are extra args for function
)

where target is some function that runs the whole analysis. It takes a list of files as an input, runs the SystematicsProducer and TagSequence on these files, and then presumably writes these to an output format. This could be done through an Analysis class, which owns a TagSequence, SystematicsProducer (which may vary by sample), etc.

I think it makes the most sense to have one call of add_jobs for each sample, as the exact details of running will be different in principle for different samples, and the target and/or args can be modified for each. The JobsManager should not need to know any details of the physics analysis being done in its jobs, we would simply pass a different function (or the same function with different arguments) for each of the jobs.

I'd suggest we start with implementing local submission, and then add tools for running on HPC clusters as described in #2 .