Skip to content

Adaptive job splitting

Andrey Popov requested to merge adaptive-job-splitting into master

For each dataset, choose the number of files to be processed per job according to the desired number of events per job (currently set to 1M). This limits the spread of job run times. The splitting relies on new parameter num_selected_events in dataset definition files. It gives the total number of events stored in files of the dataset. This parameter is optional. If it's not found, the old default splitting of 25 files per job is used.

I checked that prepareAllJobs.py runs and creates a reasonable number of jobs for each dataset.

@npostiau, the scripts that produce dataset definition files need to be updated to include this parameter.

Merge request reports