Introduce TaskList handlers to automate generation of tasks
The TaskList
class in now a general class that provides an interface for dealing with a tasklist. There is an assumption that they are all multi-process safe.
A tasklist definition is a directory. It is up to the TaskList
implementation to define the format for parsing.
The work directory is the directory where the progress of a task list run is being stored (task status, lock files, log files). Note that there is a not a one-to-one mapping of a task list to a work directory. For example, a single task list can be processed by two runners (ie: task list is a filelist and the two runners are different reconstruction configurations).
The base TaskList
class provides a mechanism to ensure multiprocess safety via lock files.
Implementations need to implement the following:
- Constructor that takes paths to the tasklist definition and work directories as arguments
-
__len__
to get total number of tasks -
pop_task
to get the next available task -
__getitem__
to get task information -
record_toprocess_task
to mark task as still needed to be processed -
record_done_task
to mark as task as done -
get_finished
to get list of finished tasks
The "task information" is the command that should be executed by a runner.
ListTaskList
Implementation
This is the current implementation using files to maintain lists of commands.
Note that ListTaskList
can also be extended. The __getitem__
function can be overridden to provide extra formatting to a task. An example is treating the tasklist as a list of files and the tasklist handler being responsible for formatting the processing string.
TaskList Handler Definitions
TaskList
definitions are loaded from pytaskfarmer/tasklists.d
and the current working directory. All files ending in *.ini
are loaded and are expected to be the INI format. The following scheme is expected:
[tasklisthandlername]
TaskList = tasklist.python.class
Arg0 = value0
Arg1 = value1
The extra arguments are passed to the TaskList
constructor as keyword arguments.
Including the working directory in the search path is useful for defining project specific tasklist handlers. For example, one can treat the tasklist as a list of files with events. The tasklist handler would then construct the necessary command to run over a single file with the desired job options. The job option is part of the handler definition.
Example for Marlin
The following tasklist handler treats the tasklist as a filelists. The Marlin command is then executed on each file with the actsseed_steer.xml
steering file.
[actsseed_bib]
TaskList = mcctaskfarmer.MarlinTaskList
steering = /global/cfs/cdirs/atlas/kkrizka/MCC/build/packages/ACTSTracking/example/actsseed_steer.xml
OUTPUT=data_actsseed_bib
MyLCParquet.OutputDir=${OUTPUT}
MyAIDAProcessor.FileName=${OUTPUT}/${SAMPLE}
MyLCParquet.SampleName=${SAMPLE}
TODO
-
Redefine how tasklist progress is stored. Storing it per tasklist (ie: tasklist_toprocess
) no longer works as multiple tasklist handlers can run over the same tasklist as the same time to provide different results. For example, when tasklist = filelist and different handlers are different job options. -
Update README.md