framework.tex 24.6 KB
Newer Older
1
2
3
\chapter{The \corrybold Framework}
\label{ch:framework}

4
5
6
7
\corry  is based on a collection of modules read from a configuration file and an event loop which sequentially processes data from all detectors assigned to one event.
In each loop iteration, all modules are executed in the linear order they appear in the configuration file and have access to all data created by previous modules.
At the end of the loop iteration, the data cache is cleared and the framework continues with processing the next iteration.
This chapter provides basic information about the different components of the framework, its executable as well as the available configuration parameters that are processed on a global level.
8
9
10

\section{The \texttt{corry} Executable}
\label{sec:executable}
11
12
13
The \corry executable \command{corry} functions as the interface between the user and the framework.
It is primarily used to provide the main configuration file, but also allows options from the main configuration file to be added or overwritten.
This is useful for quick testing as well as for batch processing of many runs to be reconstructed.
14
15
16
17

The executable handles the following arguments:
\begin{itemize}
\item \texttt{-c <file>}: Specifies the configuration file to be used for the reconstruction, relative to the current directory.
18
19
This is the only \textbf{required} argument and the program will fail to start if this argument is not given.
\item \texttt{-l <file>}: Specify an additional location, such as a file to forward log output into. This is used as an additional destination alongside the standard output and the location specified in the framework parameters described in Section~\ref{sec:framework_parameters}.
20
21
22
\item \texttt{-v <level>}: Sets the global log verbosity level, overwriting the value specified in the configuration file described in Section~\ref{sec:framework_parameters}.
Possible values are \texttt{FATAL}, \texttt{STATUS}, \texttt{ERROR}, \texttt{WARNING}, \texttt{INFO} and \texttt{DEBUG}, where all options are case-insensitive.
The module specific logging level introduced in Section~\ref{sec:logging_verbosity} is not overwritten.
23
24
25
\item \texttt{-o <option>}: Passes extra framework or module options that adds to and/or overwrites those in the main configuration file.
This argument may be specified multiple times to add multiple options.
Options are specified as key-value pairs in the same syntax as used in the configuration files described in Chapter~\ref{ch:configuration_files}, but the key is extended to include a reference to a configuration section or instantiation in shorthand notation.
26
There are three types of keys that can be specified:
27
\begin{itemize}
28
29
30
    \item Keys to set \textbf{framework parameters}: These have to be provided in exactly the same way as they would be in the main configuration file (a section does not need to be specified). An example to overwrite the standard output directory would be \texttt{corry -c <file> -o output\_directory="run123456"}.
    \item Keys for \textbf{module configurations}: These are specified by adding a dot (\texttt{.}) between the module and the key as it would be given in the configuration file (thus \textit{module}.\textit{key}). An example to overwrite the number of hits required for accepting a track candidate would be \command{corry -c <file> -o Tracking4D.min_hits_on_track=5}.
    \item Keys to specify values for a particular \textbf{module instantiation}: The identifier of the instantiation and the name of the key are split by a dot (\texttt{.}), in the same way as for keys for module configurations (thus \textit{identifier}.\textit{key}). The unique identifier for a module can contain one or more colons (\texttt{:}) to distinguish between various instantiations of the same module. The exact name of an identifier depends on the name of the detector. An example to change the neighbor pixel search radius of the clustering algorithm for a particular instantiation of the detector named \textit{my\_dut} could be \command{corry -c <file> -o Clustering4D:my_dut.neighbour_radius_row=2}.
31
\end{itemize}
32
33
34
Note that only the single argument directly following the \texttt{-o} is interpreted as the option. If there is whitespace in the key-value pair this should be properly enclosed in quotation marks to ensure that the argument is parsed correctly.
\item \texttt{-g <option>}: Passes extra detector options that are added to and/or overwritten in the detector configuration file.
This argument can be specified multiple times to add multiple options.
35
The options are parsed in the same way as described above for module options, but only one type of key can be specified to overwrite an option for a single detector.
36
These are specified by adding a dot (\texttt{.}) between the detector and the key as it would be given in the detector configuration file (thus \textit{detector}.\textit{key}). An example to change the orientation of a particular detector named \texttt{detector1} would be \texttt{corry -c <file> -g detector1.orientation=0deg,0deg,45deg}.
37
38
\end{itemize}

39
No direct interaction with the framework is possible during the reconstruction. Signals can be sent using keyboard shortcuts to terminate the run, either gracefully or with force. The executable understands the following signals:
40
\begin{itemize}
41
\item SIGINT (\texttt{CTRL+C}): Request a graceful shutdown of the reconstruction. This means the event currently being processed is finished, while all other events requested in the configuration file are ignored. After finishing the event, the finalization stage is executed for every module to ensure all modules finish properly.
42
\item SIGTERM: Same as SIGINT, request a graceful shutdown of the reconstruction. This signal is emitted e.g.\ by the \command{kill} command or by cluster computing schedulers to ask for a termination of the job.
43
\item SIGQUIT (\texttt{CTRL+\textbackslash}): Forcefully terminates the framework. It is not recommended to use this signal as it will normally lead to the loss of all generated data. This signal should only be used when graceful termination is not possible.
44
45
\end{itemize}

Simon Spannagel's avatar
Simon Spannagel committed
46
47
\section{The Clipboard}

48
49
50
The clipboard is the framework infrastructure for temporarily storing information during the event processing.
Every module can access the clipboard to both read and write information.
Collections or individual elements on the clipboard are accessed via their data type, and an optional key can be used in addition to identify them with a certain detector by its name.
51

52
The clipboard consists of three parts: the event, a temporary data storage, and a persistent storage space.
53
54
55
56
57

\subsection{The Event}

The event is the central element storing meta-information about the data currently processed.
This includes the time frame (or data slice) within which all data are located as well as trigger numbers associated with these data.
58
New data to be added are always compared to the event on the clipboard to determine whether they should be discarded, included, or buffered for later use.
59
A detailed description of the event building process is provided in Chapter~\ref{ch:events}.
60
61

\subsection{Temporary Data Storage}
62
The temporary data storage is only available during the processing of each individual event.
63
It is automatically cleared at the end of the event processing and has to be populated with new data in the new event to be processed.
64

65
The temporary storage acts as the main data structure to communicate information between different modules and can hold multiple collections of \corry objects such as pixel hits, clusters, or tracks.
66
In order to be able to flexibly store different data types on the clipboard, the access methods for the temporary data storage are implemented as templates, and vectors of any data type deriving from \parameter{corry::Object} can be stored and retrieved.
67
68

\subsection{Persistent Storage}
69
The persistent storage is not cleared at the end of processing each event and can be used to store information used in multiple events.
70
Currently this storage only allows for the caching of double-precision floating point numbers.
Simon Spannagel's avatar
Simon Spannagel committed
71
72
73


\section{Global Framework Parameters}
74
\label{sec:framework_parameters}
75
The \corry framework provides a set of global parameters that control and alter its behavior. These parameters are inherited by all modules.
76
77
78
79
The currently available global parameters are:

\begin{itemize}
\item \parameter{detectors_file}: Location of the file describing the detector configuration described in Section~\ref{sec:detector_config}.
80
81
82
The only \textit{required} global parameter as the framework will fail to start if it is not specified.
This parameter can take multiple paths; all provided files will be combined into one geometry description of the setup.
\item \parameter{detectors_file_updated}: Location of the file that the (potentially) updated detector configuration should be written into. If this file does not already exist, it will be created. If the same file is given as for \parameter{detectors_file}, the file is overwritten. If no file is specified using this parameter then the updated geometry is not written to file.
83
\item \parameter{histogram_file}: Location of the file where the ROOT output histograms of all modules will be written to. The file extension \texttt{.root} will be appended if not present. Directories within the ROOT file will be created automatically for all modules.
84
85
\item \parameter{number_of_events}: Determines the total number of events the framework will process, where negative numbers allow for the processing of all data available.
After reaching the specified number of events, the reconstruction is stopped.
86
Defaults to $-1$.
87
88
\item \parameter{number_of_tracks}: Determines the total number of tracks the framework should reconstruct, where negative numbers indicate that there is no limit on the number of reconstructed tracks.
After reaching the specified number of events, the reconstruction is stopped.
89
Defaults to $-1$.
90
\item \parameter{run_time}: Determines the wall-clock time of data acquisition the framework should reconstruct up until. Negative numbers indicate that there is no limit on the time slice to reconstruct.
91
Defaults to $-1$.
92
93
\item \parameter{log_level}: Specifies the lowest log level that should be reported.
+Possible values are \texttt{FATAL}, \texttt{STATUS}, \texttt{ERROR}, \texttt{WARNING}, \texttt{INFO}, and \texttt{DEBUG}, where all options are case-insensitive.
94
95
96
Defaults to the \texttt{INFO} level.
More details and information about the log levels, including how to change them for a particular module, can be found in Section~\ref{sec:logging_verbosity}.
Can be overwritten by the \texttt{-v} parameter on the command line (see Section~\ref{sec:executable}).
97
98
\item \parameter{log_format}: Determines the log message format to be displayed.
Possible options are \texttt{SHORT}, \texttt{DEFAULT}, and \texttt{LONG}, where all options are case-insensitive.
99
More information can be found in Section~\ref{sec:logging_verbosity}.
100
\item \parameter{log_file}: File where the log output will be written, in addition to the standard printing (usually the terminal).
101
Another (additional) location to write to can be specified on the command line using the \texttt{-l} parameter (see Section~\ref{sec:executable}).
102
\item \parameter{library_directories}: Additional directories to search in for module libraries, before searching the default paths.
103
104
105
\item \parameter{output_directory}: Directory to write all output files into.
Subdirectories are created automatically for all module instantiations.
This directory will also contain the \parameter{histogram_file} specified via the parameter described above.
106
Defaults to the current working directory with the subdirectory \dir{output/} attached.
107
108
\item \parameter{purge_output_directory}: Decides whether the content of an already existing output directory is deleted before a new run starts. Defaults to \texttt{false}, i.e. files are kept but will be overwritten by new files created by the framework.
\item \parameter{deny_overwrite}: Forces the framework to abort the run and throw an exception when attempting to overwrite an existing file. Defaults to \texttt{false}, i.e. files are overwritten when requested. This setting is inherited by all modules, but can be overwritten in the configuration section of each of the modules.
109
110
\end{itemize}

111
112
\section{Modules and the Module Manager}
\label{sec:module_manager}
113
114
115
116
117
\corry is a modular framework and one of the core ideas is to partition functionality into independent modules that can be inserted or removed as required.
These modules are located in the subdirectory \dir{src/modules/} of the repository, with the name of the directory as the unique name of the module.
The suggested naming scheme is CamelCase, thus an example module name would be \module{OnlineMonitor}.
Any \emph{specifying} part of a module name should precede the \emph{functional} part of the name, e.g.\ \module{EventLoaderCLICpix2} rather than \module{CLICpix2EventLoader}.
There are three different kinds of modules that can be defined:
118
119
\begin{itemize}
    \item \textbf{Global}: Modules for which a single instance runs, irrespective of the number of detectors.
120
    \item \textbf{Detector}: Modules that are concerned with only a single detector at a time.
121
122
    These are then replicated for all required detectors.
    \item \textbf{DUT}: Similar to the Detector modules, these modules run for a single detector.
123
    However, they are only replicated if the respective detector is marked as device-under-test (DUT).
124
\end{itemize}
125
The type of module determines the constructor used, the internal unique name, and the supported configuration parameters.
126
For more details about the instantiation logic for the different types of modules, see Section~\ref{sec:module_instantiation}.
127

128
129
Furthermore, detector modules can be restricted to certain types of detectors.
This allows the framework to automatically determine for which of the given detectors the respective module should be instantiated.
130
131
For example, the \module{EventLoaderCLICpix2} module will only be instantiated for detectors with the correct type \parameter{CLICpix2} without any additional configuration effort required by the user.
This procedure is described in more detail in Section~\ref{sec:module_files}.
132

Simon Spannagel's avatar
Simon Spannagel committed
133
134
\subsection{Module Status Codes}

135
136
The \command{run()} function of each module returns a status code to the module manager to indicate the status of the module.
These codes can be used to e.g. request the end of the current run or to signal problems in data processing.
Simon Spannagel's avatar
Simon Spannagel committed
137
138
139
140
141
The following status codes are currently supported:

\begin{description}
    \item{\command{Success}}: Indicates that the module successfully finished processing all data. The framework continues with the next module.
    \item{\command{NoData}}: Indicates that the respective module did not find any data to process. The framework continues with the next module.
142
143
144
    \item{\command{DeadTime}}: Indicates that the detector handled by the respective module is currently in data acquisition dead time. The framework skips all remaining modules for this event and continues with the subsequent event. By employing this method, measurements such as efficiency are not affected by known inefficiencies during detector dead times.
    \item{\command{EndRun}}: Allows the module to request a premature end of the run. This can e.g.\ be used by alignment modules to stop the run after they have accumulated enough tracks for the alignment procedure. The framework executes all remaining modules for the current event and then enters the finalization stage.
    \item{\command{Failure}}: Indicates that there was a severe problem when processing data in the respective module. The framework skips all remaining modules for this event and enters the finalization stage.
Simon Spannagel's avatar
Simon Spannagel committed
145
146
\end{description}

Simon Spannagel's avatar
Simon Spannagel committed
147
148
\subsection{Execution Order}

149
150
Modules are executed in the order in which they appear in the configuration file; the sequence is repeated for every time frame or event.
If one module in the chain raises e.g.\ a \command{DeadTime} status, the processing of the next event is begun without executing the remaining modules for the current event.
151

152
153
The execution order is of special importance in the event building process, since the first module will always have to define the event while subsequent event loader modules will have to adhere to the time frame defined by the first module.
A complete description of the event building process is provided in Chapter~\ref{ch:events}.
154

155
This behavior should also be taken into account when choosing the order of modules in the configuration file, since e.g.\ data from detectors catered by subsequent event loaders is not processed and hit maps are not updated if an earlier module requested to skip the rest of the module chain.
Simon Spannagel's avatar
Simon Spannagel committed
156

157
158
159
\subsection{Module instantiation}
\label{sec:module_instantiation}
Modules are dynamically loaded and instantiated by the Module Manager.
160
161
They are constructed, initialized, executed, and finalized in the linear order in which they are defined in the configuration file. For this reason, the configuration file should follow the order of the real process.
For each section in the main configuration file (see Chapter~\ref{ch:configuration_files} for more details), a corresponding library is searched for which contains the module (the exception being the global framework section).
162
163
164
Module libraries are always named following the scheme \textbf{libCorryvreckanModule\texttt{ModuleName}}, reflecting the \texttt{ModuleName} configured via CMake.
The module search order is as follows:
\begin{enumerate}
165
\item Modules already loaded from an earlier section header
166
\item All directories in the global configuration parameter \parameter{library_directories} in the provided order, if this parameter exists.
167
\item The internal library paths of the executable, which should automatically point to the libraries that are built and installed together with the executable.
168
169
170
171
172
These library paths are stored in \dir{RPATH} on Linux, see the next point for more information.
\item The other standard locations to search for libraries depending on the operating system.
Details about the procedure Linux follows can be found in~\cite{linuxld}.
\end{enumerate}

173
174
175
If the loading of the module library is successful, the module is checked to determine if it is a global, detector, or DUT module.
A single module may be called multiple times in the configuration with overlapping requirements (such as a module that runs on all detectors of a given type, followed by the same module but with different parameters for one specific detector, also of this type). Hence the Module Manager must establish which instantiations to keep and which to discard.
The instantiation logic determines a unique name and priority for every instantiation. Internally these are handled as numerical values, where a lower number indicates a higher priority.
176
177
178
179
180
181
182
183
184
185
186
The name and priority for the instantiation are determined differently for the two types of modules:
\begin{itemize}
\item \textbf{Global}: Name of the module, the priority is always \emph{high}.
\item \textbf{Detector/DUT}: Combination of the name of the module and the name of detector this module is executed for.
If the name of the detector is specified directly by the \parameter{name} parameter, the priority is \emph{high}.
If the detector is only matched by the \parameter{type} parameter, the priority is \emph{medium}.
If the \parameter{name} and \parameter{type} are both unspecified and the module is instantiated for all detectors, the priority is \emph{low}.
\end{itemize}
In the end, only a single instance for every unique name is allowed.
If there are multiple instantiations with the same unique name, the instantiation with the highest priority is kept.
If multiple instantiations with the same unique name and the same priority exist, an exception is raised.
187

188
189
\section{Logging and Verbosity Levels}
\label{sec:logging_verbosity}
190
191
\corry is designed to identify mistakes and implementation errors as early as possible and to provide the user with clear indications of the location and cause the problem.
The amount of feedback can be controlled using different log levels, which are inclusive, i.e.\ lower levels also include messages from all higher levels.
192
193
194
195
The global log level can be set using the global parameter \parameter{log_level}.
The log level can be overridden for a specific module by adding the \parameter{log_level} parameter to the respective configuration section.
The following log levels are supported:
\begin{itemize}
196
197
198
\item \textbf{FATAL}: Indicates a fatal error that will lead to the direct termination of the application.
Typically only emitted in the main executable after catching exceptions, as they are the preferred way of fatal error handling.
An example of a fatal error is an invalid value for an existing configuration parameter.
199
\item \textbf{STATUS}: Important information about the status of the reconstruction.
200
201
202
Is only used for messages that have to be logged in every run, such as initial information on the module loading, opened data files, and the current progress of the run.
\item \textbf{ERROR}: Severe error that should not occur during a normal, well-configured reconstruction.
Frequently leads to a fatal error and can be used to provide extra information that may help in finding the problem. For example, it is used to indicate the reason a dynamic library cannot be loaded.
203
204
\item \textbf{WARNING}: Indicate conditions that should not occur normally and possibly lead to unexpected results.
The framework will however continue without problems after a warning.
205
A warning is, for example, issued to indicate that a calibration file for a certain detector cannot be found and that the reconstruction is therefore performed with uncalibrated data.
206
207
\item \textbf{INFO}: Information messages about the reconstruction process.
Contains summaries of the reconstruction details for every event and for the overall run.
208
209
210
This should typically produce at maximum one line of output per event and module.
\item \textbf{DEBUG}: In-depth details about the progress of the reconstruction, such as information on every cluster formed or on track fitting results.
Produces large volumes of output per event and should therefore only be used for debugging the reconstruction process.
211
212
\item \textbf{TRACE}: Messages to trace what the framework or a module is currently doing.
Unlike the \textbf{DEBUG} level, it does not contain any direct information about the physics but rather indicates which part of the module or framework is currently running.
213
Mostly used for software debugging or determining performance bottlenecks in the reconstruction.
214
215
216
217
\end{itemize}

\begin{warning}
    It is not recommended to set the \parameter{log_level} higher than \textbf{WARNING} in a typical reconstruction as important messages may be missed.
218
    Setting too low logging levels should also be avoided since printing many log messages will significantly slow down the reconstruction.
219
220
221
222
223
224
225
226
\end{warning}

The logging system supports several formats for displaying the log messages.
The following formats are supported via the global parameter \parameter{log_format} or the individual module parameter with the same name:
\begin{itemize}
\item \textbf{SHORT}: Displays the data in a short form.
Includes only the first character of the log level followed by the configuration section header and the message.
\item \textbf{DEFAULT}: The default format.
227
Displays the system time, log level, section header, and the message itself.
228
\item \textbf{LONG}: Detailed logging format.
229
230
Displays all of the above but also indicates the source code file and line where the log message was produced.
This can help when debugging modules.
231
\end{itemize}
232

Simon Spannagel's avatar
Simon Spannagel committed
233
\section{Coordinate Systems}
234
235
236
237
\label{sec:coordinate_systems}

Local coordinate systems for each detector and a global frame of reference for the full setup are defined.
The global coordinate system is chosen as a right-handed Cartesian system, and the rotations of individual devices are performed around the geometrical center of their sensor.
238
239
Here, the beam direction defines the positive z-axis at the origin of the x-y-plane.
The origin along the z-axis is fixed by the placement of the detectors in the geometry of the setup.
240

241
There are several kinds of detectors with different local coordinate system, e.g., Cartesian coordinate for pixel detector and polar coordinate for some specific strip detectors. For this reason, the \textsl{Detector} class is designed as an interface, the detector with a specific coordinate system will be generated according to the parameter in the detector configuration file. In the current version, the Cartesian coordinate is the only available one. 
242
243

For the local Cartesian coordinate system, it is right-handed with the x- and y-axes defining the sensor plane.  The origin of this coordinate system is the center of the lower left pixel in the grid, i.e.\ the pixel with indices (0,0).  This simplifies calculations in the local coordinate system as all positions can either be stated in absolute numbers or in fractions of the pixel pitch.