Multi-event scheduler
A multi-event-scheduler
What's that
This is a scheduler can execute a control and dataflow configuration, similarly to HLTControlFlowMgr in LHCb.
(see https://iopscience.iop.org/article/10.1088/1742-6596/1525/1/012052/pdf if you are not familiar)
Sharing functionality
Much of the functionality that we introduce here is already existent in Moore and the CPU HLT. Specifically how control and dataflow is set up in the configuration: CompositeNodes define control flow constraints and data flow constraints are defined in the background by requiring that a producer of data runs before its consumer. We have ripped out Moore/PyConf functionality for that (MiniPyConf here), specifically the data_flow and components modules.
Overall, Allen configuration looks similar to Moore configuration with this setup.
How does it work
-
From the CompositeNode tree that defines the application, we first extract execution masks for all algorithms. Consider a simple tree: You would like to run algorithms A,B,C in a lazy fashion, with a connecting OR. The masks that the scheduler extracts from this tree are given in the Leafs. B only has to run in case A did not pass, and C only if B and A did not pass.
-
For algorithms that appear multiple times, we merge the execution lists with an ANY relationship
-
we simplify the boolean mask expressions using a solver (sympy.simplify)
-
we gather data and control flow dependencies:
- data dependencies are found by backtracing inputs (PyConf feature)
- control flow dependencies are extracted by parsing the simplified boolean expression (from 3.) back into control flow trees and extracting the algorithms.
-
we order algorithms
- data dependencies serve as constraints
- control flow dependencies serve as soft constraints
- we might find ourselves in a position where data & control flow dependencies cannot all be accounted for, in which case we loosen the control flow constraints by loosening the execution mask for one of the algorithms that is insertable according to data dependencies. Generic mask loosening is done by substituting an algorithm in a mask by True or False and then simplifying again. Example:
(A & B) -> loosen by B -> (A & True) | (A & False) -> A
-
we receive an ordered collection of algorithms with their respective execution masks
-
For every unique, nontrivial execution mask we build a combiner algorithm that is inserted in the sequence right before the first algorithm with that execution mask
-
The allen executable sequence is generated and compiled
-
Execution works as follows:
- Algorithms with execution mask
True
are executed on every event - Execution is governed by event lists: Algorithms execute on every event in the event list that they get as input
- Algorithms that can reduce the event list, like the GEC, export one as output, which is then consumed by algorithms with
GEC
as execution mask - For more complicated masks, like
GEC & BLUB
, the combiner algorithms take care of event list union(OR) / intersection(AND) / inversion(NOT) before the algorithm actually executes
- Algorithms with execution mask
Algorithm order optimizations
With every control flow tree there are multiple possible orderings that the scheduler might consider, and the throughput depend on these orders.
There are two types of order swaps that one might consider: A lazy control flow node that does not require a specifc order is one where swapping orders yields different execution masks for different algorithms. As simple heuristic, one can assume that more expensive algorithms are better associated with sparse execution masks.
Defining how expensive an algorithm is and how sparse an execution mask is, is a highly non-trivial task. In fact, doing so perfectly requires actually running the application in all possible orders, which is something we would like to avoid. Currently, we hardcode educated guesses for the weight of an algorithm execution. Some weights are taken from a profile run. Some other heuristics help in setting weights that result in acceptable sequences, like the fact that data providers should be spread out as far as possible to not create io bottlenecks.
In Summary, trying to model accurate execution weights and average efficiencies for algorithms in this heterogenous architechture seems like a bad idea. Instead, it might make more sense to employ optimization algorithms that operate over possible orderings and test each ordering with a quick benchmark, automatically. We expect this procedure to take a long time to complete, but maybe we don't have to optimize on such a high level for too many configurations. Ideas include genetic algorithms or simulated annealing. (none of these are implemented yet, but thats not the scope of this MR anyway)
Built on top of !393 (merged)
TODO:
-
Some more tests for the python core functionality, specifically cftree_ops.py and event_list_utils.py -
Merge minipyconf and pyconf (done in LHCb!2964 (merged)) -
Cleanup of the physics configurations (part of this MR) -
Check that every host_datatype
is only assigned tohost_datatype
s, and that everydevice_datatype
is only assigned todevice_datatype
s. -
Update documentation. -
Once LHCb!2964 (merged) is merged, change GenerateConfiguration.cmake
to useHEAD
instead of a branch of LHCb.
List of changes
One general remark: this MR changes the code generation steps of Allen, but otherwise it does minimal changes to the rest of the codebase. That means that for the most part, with the exception of the introduction of MASK_
types and the event list intersection, union and inversion
, all headers / sources are not modified.
Here is a list of requirements and changes introduced to Allen as part of this MR:
- Pregenerated sequences are removed. Python 3 and libClang are now requirements.
-
git
is required inSTANDALONE
to be able to fetch PyConf from LHCb. - The option
SEQUENCE_GENERATION
is therefore gone as well, and so is its CI job. - The obsolete
gaudi
configurations and previous configurations are gone. - The following directory structure has been created:
AllenConf
contains an "extension" to PyConf to enable Multi Event Scheduling,sequences
contains all sequences,sequences/definitions
contains definition files used by the sequences, andtests
includes MES checks. - The following configurations exist and can be therefore passed to the cmake
SEQUENCE
option:
|-- sequences
| |-- forward.py
| |-- hlt1_complex_validation.py
| |-- hlt1_pp_default.py
| |-- hlt1_pp_no_gec.py
| |-- hlt1_pp_no_gec_validation.py
| |-- hlt1_pp_non-restricted_UT.py
| |-- hlt1_pp_scifi_v6.py
| |-- hlt1_pp_scifi_v6_validation.py
| |-- hlt1_pp_validation.py
| |-- muon.py
| |-- pv.py
| |-- velo.py
| `-- veloUT.py
-
sequences/definitions
files have been refactored heavily. Files now identify with subdetector reconstructions (eg.velo_reconstruction.py
,ut_reconstruction.py
), there are different files for lines (eg.hlt1_technical_lines.py
,hlt1_muon_lines.py
), validators, persistency, and so on. - A number of python tests have been added to run as part of the Allen CI to test the MES functionality.
- A complex sequence test that runs various instances of reconstruction algorithms has been added.
- Combiner algorithms
event_list_intersection, event_list_union and event_list_inversion
have been added. - All event list arguments have become
MASK_INPUT
orMASK_OUTPUT
. There can be at most a singleMASK_INPUT
and a singleMASK_OUTPUT
parameter per algorithm, which are used internally by the MES and don't need to be configured as part of the algorithms.
Should be merged after LHCb!2964 (merged) (and should be tested with it too).