Skip to content

Introduce TaskList handlers to automate generation of tasks

Karol Krizka requested to merge kk_tasklist into master

The TaskList class in now a general class that provides an interface for dealing with a tasklist. There is an assumption that they are all multi-process safe.

A tasklist definition is a directory. It is up to the TaskList implementation to define the format for parsing.

The work directory is the directory where the progress of a task list run is being stored (task status, lock files, log files). Note that there is a not a one-to-one mapping of a task list to a work directory. For example, a single task list can be processed by two runners (ie: task list is a filelist and the two runners are different reconstruction configurations).

The base TaskList class provides a mechanism to ensure multiprocess safety via lock files.

Implementations need to implement the following:

  • Constructor that takes paths to the tasklist definition and work directories as arguments
  • __len__ to get total number of tasks
  • pop_task to get the next available task
  • __getitem__ to get task information
  • record_toprocess_task to mark task as still needed to be processed
  • record_done_task to mark as task as done
  • get_finished to get list of finished tasks

The "task information" is the command that should be executed by a runner.

ListTaskList Implementation

This is the current implementation using files to maintain lists of commands.

Note that ListTaskList can also be extended. The __getitem__ function can be overridden to provide extra formatting to a task. An example is treating the tasklist as a list of files and the tasklist handler being responsible for formatting the processing string.

TaskList Handler Definitions

TaskList definitions are loaded from pytaskfarmer/tasklists.d and the current working directory. All files ending in *.ini are loaded and are expected to be the INI format. The following scheme is expected:

[tasklisthandlername]
TaskList = tasklist.python.class
Arg0 = value0
Arg1 = value1

The extra arguments are passed to the TaskList constructor as keyword arguments.

Including the working directory in the search path is useful for defining project specific tasklist handlers. For example, one can treat the tasklist as a list of files with events. The tasklist handler would then construct the necessary command to run over a single file with the desired job options. The job option is part of the handler definition.

Example for Marlin

The following tasklist handler treats the tasklist as a filelists. The Marlin command is then executed on each file with the actsseed_steer.xml steering file.

[actsseed_bib]
TaskList = mcctaskfarmer.MarlinTaskList
steering = /global/cfs/cdirs/atlas/kkrizka/MCC/build/packages/ACTSTracking/example/actsseed_steer.xml
OUTPUT=data_actsseed_bib
MyLCParquet.OutputDir=${OUTPUT}
MyAIDAProcessor.FileName=${OUTPUT}/${SAMPLE}
MyLCParquet.SampleName=${SAMPLE}

TODO

  • Redefine how tasklist progress is stored. Storing it per tasklist (ie: tasklist_toprocess) no longer works as multiple tasklist handlers can run over the same tasklist as the same time to provide different results. For example, when tasklist = filelist and different handlers are different job options.
  • Update README.md
Edited by Karol Krizka

Merge request reports