Skip to content
Snippets Groups Projects
README.md 16.4 KiB
Newer Older
Cenk Yildiz's avatar
Cenk Yildiz committed
* [Introduction](#introduction)
* [HLT Multi Processor Processing Unit](#hlt-multi-processor-processing-unit)
  * [Communication with Run Control](#communication-with-run-control)
    * [Online](#online)
    * [Offline](#offline)
  * [Configuration](#configuration)
    * [Online](#online-1)
    * [Offline](#offline-1)
    * [Extra Params](#extra-params)
  * [DataSource Module](#datasource-module)
    * [DFDataSource](#dfdatasource)
    * [Online](#online-2)
    * [Offline](#offline-2)
  * [Trigger Module](#trigger-module)
  * [Monitoring Module](#monitoring-module)
  * [HLTMPPU Sequence Diagram(TODO)](#hltmppu-sequence-diagram-todo)
* [Dcm Emulator](#dcm-emulator)
  * [Interprocess Communication](#interprocess-communication)
* [HLTMPPy Python Framework](#hltmppy-python-framework)
  * [Examples](#examples)
    * [pudummy](#pudummy)
    * [Athena](#athena)
* [Tests](#tests)
* [Also see](#also-see)

# Introduction

HLTMPPU is the DAQ component that executes HLT code.
It establishes communication between Trigger and DAQ system via well defined interfaces.

# HLT Multi Processor Processing Unit

The main components of this package is HLTMPPU library and `HLTMPPU_main` application. HLTMPPU implements HLTInterface
and it is called by the executable `HLTMPPU_main`.

HLTMPPU is part of TDAQ software, however historically it was kept relatively decoupled from DAQ System, in terms of OKS
and Run Control.

Throughout this document, you'll see terms Online and Offline.  **Online**: HLTMPPU running in the online environment
within a TDAQ partition. For HLTMPPU to run, online system requires a set of applications such as: `HLTMPPU_main`,
`HLTRC`, `dcm`, `HLTSV`.  **Offline**: Standalone running without a partition, and surrounding applications. HLTMPPU can
run as a separate C++ or python application.

## Communication with Run Control

### Online
`HLTMPPU_main` is the main application. It is not a Run Control application, but follows TDAQ state transitions using
`HLTReceiver` from [HLTRC](../../HLTRC/) package. HLTRC sends commands to HLTMPPU via `HLTSender`.

### Offline
Offline HLTMPPU applications must manually go through state transitions.
(see: [test/HLTMPPU_test.cxx](test/HLTMPPU_test.cxx))
(see: [#HLTMPPy-Python-Framework](#HLTMPPy-Python-Framework))

## Configuration
HLTMPPU and all its plugins receive configuration as `boost::property_tree::ptree`  objects. Ptree can be serialized to
xml, json and other formats easily. This allows possibility of storing whole ptree in strings, reading them from files
and transferring to other processes through usual procedures. 

### Online
One needs to create the OKS configuration. OKS schema of HLTMPPU is part of [HLTPUDal](../../HLTPUDal) package. HLTRC
uses HLTPUDal and OKS configuration to build ptree objects and pass it to HLTMPPU. Example OKS configuration that
consist of all necessary segments and applications is generated by
[PartitionMaker](https://gitlab.cern.ch/atlas-tdaq-software/PartitionMaker) such as:

```bash
pm_part_hlt.py -p mypart
```

### Offline
One can build the configuration as an xml, convert it to ptree and pass it to HLTMPPU. For example xml configuration
files see [data](/data) folder. For building ptree from command line arguments, see [python](#HLTMPPy-Python-Framework)
section.

### Extra Params
Configuration is mostly well defined in HLTPUDal or command line options of python programs. However there are some
extra parameters that are used to modify certain parameters. For details of implementation, check
[HLTMPPU.cxx](src/HLTMPPU.cxx)

- `dumpFDs`: dump open file descriptors before and after forking
- `dumpThreads`: dump active threads before and after forking
- `ExitWithoutCleanup`: Exit from the program through std::exit at the end of event loop.
- `forkDelay`: Upper limit of random amount of time to sleep between forks in milliseconds
- `keepForks`: Reset the number of children back to number of forks at the end of stopRun()
- `preForkSleep`: Amount of time in milliseconds to sleep before start forking
- `publishInterval`: Interval of publication for HLTMPPU monitoring information such as histograms and IS
- `SaveInputParams`: Save ptree to files at each FSM state
- `SkipFinalize`: Skip `hltSteering->stopRun()` step in children
- `SkipFinalizeWorker`: Skip `hltSteering->finalizeWorker()` step in children
- `MaxTermStagger_s`: Maximum seconds to wait for enough memory to become available for unsharing
- `allowedThreadsAtFork`: Maximum extra threads allowed at forking. Used together with `threadWaitTimeout`.
- `threadWaitTimeout`: Seconds to wait for threads to close before forking.

------

HLTMPPU class configures 3 main modules: Data Source, Trigger and Monitoring

## DataSource Module
Data Source plugin is responsible of getting the data in(L1 results, ROB fragments) and sending the data out(HLT
results). The interface is in the base class `HLTMP::DataSource`, which extends `hltinterface::DataCollector`.

Since Run-III, event loop is managed by the HLT implementatiton, and not HLTMPPU.  HLT implementation communicates with
Data Source via `hltinterface::DataCollector`, which is a singleton.

One can implement different DataSource implementations if needed. Currently only implementation is DFDataSource.

### DFDataSource

This module implements a Data Source using [dfinterface](../../dfinterface)  so that it can load `dfinterface`
implementations provided in DAQ. It has to be configured with the library that contains implementation of
`dfinterface::Event` and `dfinterface::Session`.

Each HLTMPPU fork configures a `DFDataSource`, and each `DFDataSource` initializes multiple Sessions(same as number of
event slots). Event requests from HLT are dispatched to available sessions in a round robin fashion.

`DFDataSource` also handles statistics gathering for processing times of each event.

### Online
Data Source is [dcm](../../dcm), and dfinterface is also implemented by `dcm`, and it is called `dfinterfaceDcm`

### Offline
There are 2 implementations:
- `DFFileBackend`: Ideal for running with 1 fork and 1 athenaMT event slot. Session implements reading/writing of ATLAS
  bytestream files. It doesn't require and external data source such as `dcm`.
- `DFDcmEmuBackend`: (See: [DcmEmulator](#Dcm-Emulator))

## Trigger Module

Trigger modules are responsible of generating and accept/reject decision. Trigger Module implements
`hltinterface::HLTInterface` and uses `hltinterface::DataCollector` to read L1 results of an event from data source, ask
for relevant ROB fragments, process the event and send the HLT result back to data source module. The library must
  contain a factory function called `hlt_factory` which returns a `hltinterface::HLTInterface*`.

It can be HLT(implemented in Athena) or [pudummy](../../pudummy).

## Monitoring Module

Monitoring Module is responsible of managing monitoring information. It needs to implement `hltinterface::IInfoRegister`.

It has 2 implementations:

### MonSvcInfoService
This service uses [monsvc](../../monsvc) to publish histograms registered in Athena or directly in HLTMPPU. During the
registration step it parses the paths of histograms and modifies them as necessary to match the online systems. It
expands the environment variables before passing configuration information for monsvc. If the mentioned environment is
not set, it uses the default (eg. Histogramming for OHServer).

### MonSvcInfoServiceDummPublisher
This plugin is identical to `MonSvcInfoService` plugin except that it takes an extra argument `MonInfoDummyFilePath`
pointing to a file in the xml format containing some description of the histograms, usually generated from a OH dump
file using [scripts/dumpHistSizes.py](scripts/dumpHistSizes.py). Then the plugin creates and registers histograms
described in the file together with any histograms that are registered from other places such as HLT. **This plugin can
be used for testing (for example when there is no HLT) and stressing the system and should not be used during
operation!**

## HLTMPPU Sequence Diagram(TODO)

HLTMPPU starts with `configure()` step where HLTMPPU loads plugin libraries and instantiates and configures the plugins.
At `prepareForRun()` step, mother process prepares for forking by shutting down ipc and preparing InfoRegistry and
DataSource plugins. Then it forks child processes. The Mother process then reinitializes IPC starts monitoring threads
to watch for child processes that are crashing or exiting unexpectedly. If HLTRC asks for temination of a child process
or forking of new processes, it complies. At `stopRun()` transition, mother process starts a timer and waits for the
child processes to exit. Any child processes that are still active at the timeout are sent a KILL signal.

Child processes at the fork, redirect their logs to their respective files, reinitialize IPC and then prepare
InfoRegistry and DataSource plugins for processing. Then child processes call `doEventLoop()` from the HLT
implementation and waits for it to exit.  Upon this call, HLT implementation asks for L1 result from DataSource,
processes events, and sends HLT results in form of a `eformat::FullEventFragment` back to DataSource. The Accept/Reject
decision is encoded in metadata of the full event. If the stream tags have at least one entry, the event is accepted.
Otherwise it is rejected. Result is passed down to DataSource library either for saving or discarding. When the event
loop is terminated, `doEventLoop()` exits, child calls stopRun() on Trigger plugin and then finalizeWorker(). These
steps may be bypassed by configuration options since they cause unsharing and swapping. Then HLTMPPU finalizes
InfoRegister and DataSource plugins and then returns to main(), which terminates the process.

# Dcm Emulator

Dcm Emulator is developed to overcome limitations of DFFileBackend in standalone testing or athenaHLT reprocessing. DCM
Emulator's main role is to act as a single point for reading events from the input file, and writing the HLT processing
results to a new output file.

It implements `dfinterface::Session` and `dfinterface::Event` in DFDcmEmuBackend. While running with DcmEmulator, this
dfinterface implementation should be used by DFDataSource.

Dcm Emulator can be a separate application (similar to dcm), or `DcmEmulator` class can be initialized in the main
application. Simplest way to run Dcm Emulator is to run it in the python framework, which takes care of everything.

**Dcm Emulator is only used in the offline. There is no way to run this in a partition**

## Interprocess Communication
Dcm Emulator communicates to each child process via the DFDcmEmuBackend implementation. The communication is done via
`boost::interprocess::managed_shared_memory` and `boost::interprocess::shared_memory_object`. Originally DcmEmulator was
designed so that each Session could handle multiple events. Later it was decided for each child to start multiple
Sessions and each Session to handle a single event. Due to this, DcmEmulator communication mechanism may be more complex
than it chould be.

- Each time HLT makes a `hltinterface::DataCollector::getNext()` call, `DFDcmEmuSession` notifies the `DcmEmulator` that
  it is ready to receive an event. `DcmEmulator` reads the event from file, puts the event in shared memory and notifies
  the `DFDcmEmuSession`.
- At  `hltinterface::DataCollector::collect()` calls, ROBS are read from the event in the memory, thus no communication
  to Dcm Emulator is needed.
- At `hltinterface::DataCollector::eventDone()` call, `DFDcmEmuSession` puts the result event and ros statistics in the
  shared memory and notifies `DcmEmulator`. Then `DcmEmulator` reads the event from memory, combines it with input event
  and writes it into an output file, if configured so.
- There are several `boost::interprocess::interprocess_mutex` and `boost::interprocess::interprocess_condition` to
  facilitate this communication.

# HLTMPPy Python Framework

To run HLTMPPU out of a partition context, one can use the HLTMPPy framework. It consist of:
- `HLTMPPy_boost` library: It creates python bindins for HLTMPPU and DcmEmulator. Instead of ptree, it uses serialized
  xml strings.
- `HLTMPPy` python module.

To use the `HLTMPPy` module, one has to call `runHLTMPPy()` method with a python dictionary, which has 4 main elements
each of which is also a dictionary: global, HLTMPPU, datasource, trigger, monitoring.

HLTMPPy converts this dictionary into serialized xml strings, and passes them to python binding created by
HLTMPPy_boost. `HLTMPPy` inherited a lot of functionality from Run-II `athenaHLT.py` such as running with
infrastructure, interactive running and attaching debugger.

The input dictionary can be generated and `runHLTMPPy()` can be called by one of 2 programs:
- [athenaHLT.py](https://gitlab.cern.ch/atlas/athena/blob/master/HLT/Trigger/TrigControl/TrigCommon/bin/athenaHLT.py):
  It is part of Athena release, and used for testing and reprocessing. Its command line options are more tailored for
  trigger community.
- [runHLTMPPy.py](scripts/runHLTMPPy.py): Part of this software package. It can be run using trigger module from
  Athena(similar to athenaHLT) or using purely TDAQ environment using pudummy.

Command line options are explained in the help of each program. Note that HLTMPPy does many things automatically behind
the scenes such as:
- Getting file metadata(run no, lb, detector_mask ...) from the input file ans passing it in prepareForRun ptree.
- Initializing/configuring DcmEmulator or not depending on number of forks/event slots
- Starting killing TDAQ infrastructure (initial partition, HLTMPPU partition, IS servers, OH servers) if it's configured
  to do so.

## Examples:

### pudummy
This requires setting up only TDAQ environment. One can only use runHLTMPPy.py to run with pudummy. It requires
`--ros2robs` argument. pudummy module is not as configurable as Athena. It only has one parameter `--probability` that defines
probability to accept the event. 
```bash
# Run with ros2robs from file, process 95 events, with 2 forks, 4 slots
runHLTMPPy.py --ros2robs ros2robs_full.txt -l "." \
   dffileds --file /cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/TrigP1Test/data18_13TeV.00364485.physics_EnhancedBias.merge.RAW._lb0705._SFO-1._0001.1 --numEvents 95 --outFile outFileName \
   pudummy --probability 0.5 \
   HLTMPPU -L childLogName --num-forks 2 --num-slots 4 --num-threads 4
```

For more examples to run with pudummy, one can run `test_runHLTMPPy.py`. (See: [Tests](#tests))

### Athena
This requires setting up the athena environment. Once athena is set up, one can use either `athenaHLT.py` or `runHLTMPPy.py`
```bash
# Simple athena test: process 95 events, with 2 forks and 4 slots 
athenaHLT.py --threads=2 --concurrent-events=4 --nprocs=4 --number-of-events=95 --file=/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/TrigP1Test/data17_13TeV.00327265.physics_EnhancedBias.merge.RAW._lb0100._SFO-1._0001.1 AthExHelloWorld/HelloWorldOptions.py
runHLTMPPy.py -l "." \
  dffileds --file /cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/TrigP1Test/data18_13TeV.00364485.physics_EnhancedBias.merge.RAW._lb0705._SFO-1._0001.1 --numEvents 95 --outFile outFileName \
  joboptions --joFile AthExHelloWorld/HelloWorldOptions.py \
  HLTMPPU -L childLogName --num-forks 2 --num-slots 4 --num-threads 4
```

# Tests

There are some unit tests, and applications that help testing.
- [HLTMPPUTestApp.cxx](test/HLTMPPUTestApp.cxx): Simple c++ program to run HLTMPPU without partition. You can run it in
  the `data` folder so it has access direct to configuration files. It doesn't use any testing framework, thus one needs
  to look into log files to see if something went wrong.
- [test_runHLTMPPy.py](test/test_runHLTMPPy.py): Run runHLTMPPy.py with different scenarios, using pudummy. It doesn't
  use any testing framework. A new folder is created for each run and one can inspect output files by running some shell
  commands and verify that no process is stuck, child processes completed processing and exited with correct code, there
  is no unexpected error. See explanations in the beginning of the file. This program is also a good way to learn
  different command line arguments.
- [test_HLTMPPy.py](test/test_HLTMPPy.py): Test program that verifies `TriggerConfig` objects can be generated
  succesfully. It uses python `unittest` module.

# Also see

## Related software
- [HLTInterface](../../hltinterface/)
- [dfinterface](../../dfinterface/)
- [HLTRC](../../HLTRC/)
- [HLTPUDal](../../HLTPUDal/)
- [pudummy](../../pudummy/)
-  [monsvc](../../monsvc)

## Useful Links 
- [Old Twiki](https://twiki.cern.ch/twiki/bin/viewauth/Atlas/HLTMPPU)
- [ADHI JIRA](https://its.cern.ch/jira/browse/ADHI-4671)