framework.tex 26.9 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
\chapter{The \corrybold Framework}
\label{ch:framework}

This chapter provides some crucial information about the framework, its executables and the available configuration parameters which are processed on a global level.

\section{The \texttt{corry} Executable}
\label{sec:executable}
The \corry executable \texttt{corry} functions as the interface between the user and the framework.
It is primarily used to provide the main configuration file, but also allows to add and overwrite options from the main configuration file.
This is both useful for quick testing as well as for batch processing of many runs to be reconstructed.

The executable handles the following arguments:
\begin{itemize}
\item \texttt{-c <file>}: Specifies the configuration file to be used for the reconstruction, relative to the current directory.
This is the only \textbf{required} argument, the simulation will fail to start if this argument is not given.
\item \texttt{-l <file>}: Specify an additional location such as a file to forward log output to. This is used as additional destination alongside the standard output and the location specified in the framework parameters described in Section~\ref{sec:framework_parameters}.
\item \texttt{-v <level>}: Sets the global log verbosity level, overwriting the value specified in the configuration file described in Section~\ref{sec:framework_parameters}.
Possible values are \texttt{FATAL}, \texttt{STATUS}, \texttt{ERROR}, \texttt{WARNING}, \texttt{INFO} and \texttt{DEBUG}, where all options are case-insensitive.
The module specific logging level introduced in Section~\ref{sec:logging_verbosity} is not overwritten.
\item \texttt{-o <option>}: Passes extra options which are added and overwritten in the main configuration file.
This argument may be specified multiple times, to add multiple options.

Options are specified as key/value pairs in the same syntax as used in the configuration files (refer to Chapter~\ref{ch:configuration_files} for more details), but the key is extended to include a reference to a configuration section in shorthand notation.
There are two types of keys that can be specified:
\begin{itemize}
\item Keys to set \textbf{framework parameters}. These have to be provided in exactly the same way as they would be in the main configuration file (a section does not need to be specified). An example to overwrite the standard output directory would be \command{corry -c <file> -o output_directory="run123456"}.
\item Keys for \textbf{module configurations}. These are specified by adding a dot (\texttt{.}) between the module and the actual key as it would be given in the configuration file (thus \textit{module}.\textit{key}). An example to overwrite the information written by the FileWiter module would be \command{corry -c <file> -o FileWriter.onlyDUT="true"}.
\end{itemize}
Note that only the single argument directly following the \texttt{-o} is interpreted as the option. If there is whitespace in the key/value pair this should be properly enclosed in quotation marks to ensure the argument is parsed correctly.
\end{itemize}

No interaction with the framework is possible during the reconstruction. Signals can however be send using keyboard shortcuts to terminate the run, either gracefully or with force. The executable understand the following signals:
\begin{itemize}
\item \texttt{CTRL+C} (SIGINT): Request a graceful shutdown of the reconstruction. This means the currently processed event is finished, while all other events requested in the configuration file are ignored. After finishing the event, the finalization stage is executed for every module to ensure all modules finish properly.
\item \texttt{CTRL+\textbackslash} (SIGQUIT): Forcefully terminates the framework. It is not recommended to use this signal as it will normally lead to the loss of all generated data. This signal should only be used when graceful termination is for any reason not possible.
\end{itemize}

Simon Spannagel's avatar
Simon Spannagel committed
38
39
\section{The Clipboard}

40
41
42
43
44
45
46
47
48
49
50
51
52
53
The clipboard is the framework's infrastructure for temporary storing information during the event processing.
Every module can access the clipboard and both read and write information.
Collections or individual elements on the clipboard are accessed via their name, and is internally stored as map.

The clipboard consists of two parts, a temporary storage and a persistent storage space.

\subsection{Temporary Data Storage}
The temporary data storage is only available during the processing of a single event.
It is automatically cleared at the end of the event processing and has to be populated with new data in the new event to be processed.
The temporary storage acts as the main data structure to communicate information between different modules and can hold multiple collections of \corry objects such as pixel hits, clusters or tracks.

\subsection{Persistent Storage}
The persistent storage is not cleared at the end of each event processing and can be used to store information used in multiple events.
Currently this storage only allows for the caching of double-precision floating point numbers.
Simon Spannagel's avatar
Simon Spannagel committed
54
55
56


\section{Global Framework Parameters}
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
\label{sec:framework_parameters}
The \corry framework provides a set of global parameters which control and alter its behavior. These parameters are inherited by all modules.
The currently available global parameters are:

\begin{itemize}
\item \parameter{detectors_file}: Location of the file describing the detector configuration described in Section~\ref{sec:detector_config}.
The only \textit{required} global parameter: the framework will fail to start if it is not specified.
\item \parameter{detectors_file_updated}: Location of the file that the (potentially) updated detector configuration should be written into. If this file does not already exist, it will be created. If the same file is given as for \parameter{detectors_file}, the file is overwritten with the updated values.
\item \parameter{histogram_file}: Location of the file where the ROOT output histograms of all modules will be written to. The file extension \texttt{.root} will be appended if not present. Directories within the ROOT file will be created automatically for all modules.
\item \parameter{number_of_events}: Determines the total number of events the framework should process, negative numbers allow processing of all data available.
After reaching the specified number of events, reconstruction is stopped.
Defaults to $-1$.
\item \parameter{number_of_tracks}: Determines the total number of tracks the framework should reconstruct, negative numbers indicate no limit on the number of reconstructed tracks.
After reaching the specified number of events, reconstruction is stopped.
Defaults to $-1$.
\item \parameter{run_time}: Determines the wall-clock time of data acquisition the framework should reconstruct up until. Negative numbers inidcate no limit on the time slice to reconstruct.
Defaults to $-1$.
\item \parameter{log_level}: Specifies the lowest log level which should be reported.
Possible values are \texttt{FATAL}, \texttt{STATUS}, \texttt{ERROR}, \texttt{WARNING}, \texttt{INFO} and \texttt{DEBUG}, where all options are case-insensitive.
Defaults to the \texttt{INFO} level.
More details and information about the log levels, including how to change them for a particular module, can be found in Section~\ref{sec:logging_verbosity}.
Can be overwritten by the \texttt{-v} parameter on the command line (see Section~\ref{sec:executable}).
\item \parameter{log_format}: Determines the log message format to display.
Possible options are \texttt{SHORT}, \texttt{DEFAULT} and \texttt{LONG}, where all options are case-insensitive.
More information can be found in Section~\ref{sec:logging_verbosity}.
\item \parameter{log_file}: File where the log output should be written to in addition to printing to the standard output (usually the terminal).
Another (additional) location to write to can be specified on the command line using the \texttt{-l} parameter (see Section~\ref{sec:executable}).
\item \parameter{library_directories}: Additional directories to search for module libraries, before searching the default paths.
\end{itemize}

87
88
89
90
91
92
\section{Modules and the Module Manager}
\label{sec:module_manager}
\corry is a modular framework and one of the core ideas is to partition functionality in independent modules which can be inserted or removed as required.
These modules are located in the subdirectory \textit{src/modules/} of the repository, with the name of the directory the unique name of the module.
The suggested naming scheme is CamelCase, thus an example module name would be \textit{OnlineMonitor}.
A \emph{specifying} part of a module name should precede the \emph{functional} part of the name, e.g.\ \textit{EventLoaderCLICpix2} rather than \textit{CLICpix2EventLoader}.
Simon Spannagel's avatar
Simon Spannagel committed
93
There are three different kind of modules which can be defined:
94
95
96
97
98
99
100
101
102
\begin{itemize}
    \item \textbf{Global}: Modules for which a single instance runs, irrespective of the number of detectors.
    \item \textbf{Detector}: Modules which are concerned with only a single detector at a time.
    These are then replicated for all required detectors.
    \item \textbf{DUT}: Similar to the Detector modules, these modules run for a single detector.
    However, they are only replicated if the respective detector is marked as DUT.
\end{itemize}
The type of module determines the constructor used, the internal unique name and the supported configuration parameters.
For more details about the instantiation logic for the different types of modules, see Section~\ref{sec:module_instantiation}.
103

Simon Spannagel's avatar
Simon Spannagel committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
\subsection{Module Status Codes}

The \command{run()} function of each module returns a status code to the module manager, indicating the status of the module.
These codes can be used to e.g. request an end of the current run, or to signal problems in data processing.
The following status codes are currently supported:

\begin{description}
    \item{\command{Success}}: Indicates that the module successfully finished processing all data. The framework continues with the next module.
    \item{\command{NoData}}: Indicates that the respective module did not find any data to process. The framework continues with the next module.
    \item{\command{DeadTime}}: Indicates that the detector handled by the respective module currently is in data acquisition dead time. The framework skips all remaining modules for this event and continues with the subsequent event. By this, measurements such as efficiency are not affected by known inefficiencies during detector dead times.
    \item{\command{EndRun}}: Allows the module to request a premature end of the run. This can e.g.\ be used by alignment modules to stop the run after they have accumulated enough tracks for the alignment procedure. The framework executes all modules for the current event and then enters the finalization stage.
    \item{\command{Failure}}: Indicates that there was a severe problem when processing data in the respective module. The framework skips all remaining modules for this event end enters the finalization stage.
\end{description}

Simon Spannagel's avatar
Simon Spannagel committed
118
119
120
121
122
123
\subsection{Execution Order}

Modules are executed in the order in which they appear in the configuration file, the sequence is repeated for every time frame or event.
If one module in the chain raises e.g.\ a \command{DeadTime} status, subsequent modules are not executed anymore.
This should be taken into account when choosing the order of modules in the configuration, since e.g.\ data from detectors catered by subsequent event loaders is not processed and hit maps are not updated if an earlier module requested to skip the rest of the module chain.

124
125
126
\subsection{Files of a Module}
\label{sec:module_files}
Every module directory should at minimum contain the following documents (with \texttt{ModuleName} replaced by the name of the module):
127
\begin{itemize}
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
\item \textbf{CMakeLists.txt}: The build script to load the dependencies and define the source files of the library.
\item \textbf{README.md}: Full documentation of the module.
\item \textbf{\textit{ModuleName}.h}: The header file of the module.
\item \textbf{\textit{ModuleName}.cpp}: The implementation file of the module.
\end{itemize}
These files are discussed in more detail below.
By default, all modules added to the \textit{src/modules/} directory will be built automatically by CMake.
If a module depends on additional packages which not every user may have installed, one can consider adding the following line to the top of the module's \textit{CMakeLists.txt}:
\begin{minted}[frame=single,framesep=3pt,breaklines=true,tabsize=2,linenos]{cmake}
CORRYVRECKAN_ENABLE_DEFAULT(OFF)
\end{minted}

\paragraph{CMakeLists.txt}
Contains the build description of the module with the following components:
\begin{enumerate}
\item On the first line either \parameter{CORRYVRECKAN_DETECTOR_MODULE(MODULE_NAME)}, \parameter{CORRYVRECKAN_DUT_MODULE(MODULE_NAME)} or \parameter{CORRYVRECKAN_GLOBAL_MODULE(MODULE_NAME)} depending on the type of module defined.
The internal name of the module is automatically saved in the variable \parameter{${MODULE_NAME}} which should be used as an argument to other functions.
Another name can be used by overwriting the variable content, but in the examples below, \parameter{${MODULE_NAME}} is used exclusively and is the preferred method of implementation.
For DUT and Detector modules, the type of detector this module is capable of handling can be specified by adding so-called type restrictions, e.g.\
\begin{minted}[frame=single,framesep=3pt,breaklines=true,tabsize=2,linenos]{cmake}
CORRYVRECKAN_DETECTOR_TYPE(${MODULE_NAME} "Timepix3" "CLICpix2")
\end{minted}
The module will then only be instantiated for detectors of one of the given types. This is particularly useful for event loader modules which read a very specific file format.

\item The following lines should contain the logic to load possible dependencies of the module (below is an example to load Geant4).
Only ROOT is automatically included and linked to the module.
\item A line with \texttt{\textbf{CORRYVRECKAN\_MODULE\_SOURCES(\$\{MODULE\_NAME\} \textit{sources})}} defines the module source files. Here, \texttt{sources} should be replaced by a list of all source files relevant to this module.
\item Possible lines to include additional directories and to link libraries for dependencies loaded earlier.
\item A line containing \parameter{CORRYVRECKAN_MODULE_INSTALL(${MODULE_NAME})} to set up the required target for the module to be installed to.
\end{enumerate}

A simple CMakeLists.txt for a module named \parameter{Test} which should run only on DUT detectors of type \emph{Timepix3} is provided below as an example.
\vspace{5pt}

\begin{minted}[frame=single,framesep=3pt,breaklines=true,tabsize=2,linenos]{cmake}
# Define module and save name to MODULE_NAME
CORRYVRECKAN_DUT_MODULE(MODULE_NAME)
CORRYVRECKAN_DETECTOR_TYPE(${MODULE_NAME} "Timepix3")

# Add the sources for this module
CORRYVRECKAN_MODULE_SOURCES(${MODULE_NAME}
    Test.cpp
)

# Provide standard install target
CORRYVRECKAN_MODULE_INSTALL(${MODULE_NAME})
\end{minted}

\paragraph{README.md}
The \file{README.md} serves as the documentation for the module and should be written in Markdown format~\cite{markdown}.
It is automatically converted to \LaTeX using Pandoc~\cite{pandoc} and included in the user manual in Chapter~\ref{ch:modules}.
By documenting the module functionality in Markdown, the information is also viewable with a web browser in the repository within the module sub-folder.

The \file{README.md} should follow the structure indicated in the \file{README.md} file of the \parameter{Dummy} module in \dir{src/modules/Dummy}, and should contain at least the following sections:
\begin{itemize}
\item The H1-size header with the name of the module and at least the following required elements: the \textbf{Maintainer}, the \textbf{Module Type} and the \textbf{Status} of the module.
The module type should be either \textbf{GLOBAL}, \textbf{DETECTOR}, \textbf{DUT}.
If the module is working and well-tested, the status of the module should be \textbf{Functional}.
By default, new modules are given the status \textbf{Immature}.
The maintainer should mention the full name of the module maintainer, with their email address in parentheses.
A minimal header is therefore:
\begin{verbatim}
# ModuleName
Maintainer: Example Author (<example@example.org>)
Module Type: GLOBAL
Status: Functional
\end{verbatim}
In addition, the \textbf{Detector Type} should be mentioned for modules of types \textbf{DETECTOR} and \textbf{DUT}.
\item An H3-size section named \textbf{Description}, containing a short description of the module.
\item An H3-size section named \textbf{Parameters}, with all available configuration parameters of the module.
The parameters should be briefly explained in an itemised list with the name of the parameter set as an inline code block.
\item An H3-size section named \textbf{Plots Created}, listing all plots created by this module.
\item An H3-size section with the title \textbf{Usage} which should contain at least one simple example of a valid configuration for the module.
201
202
\end{itemize}

203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
\paragraph{\texttt{ModuleName}.h and \texttt{ModuleName}.cpp}
All modules should consist of both a header file and a source file.
In the header file, the module is defined together with all of its methods.
Brief Doxygen documentation should be added to explain what each method does.
The source file should provide the implementation of every method and also its more detailed Doxygen documentation.
Methods should only be declared in the header and defined in the source file in order to keep the interface clean.

\subsection{Module structure}
\label{sec:module_structure}
All modules must inherit from the \texttt{Module} base class, which can be found in \textit{src/core/module/Module.hpp}.
The module base class provides two base constructors, a few convenient methods and several methods which the user is required to override.
Each module should provide a constructor using the fixed set of arguments defined by the framework; this particular constructor is always called during by the module instantiation logic.
These arguments for the constructor differ for global and detector/DUT modules.

For global modules, the constructor for a \texttt{TestModule} should be:
\begin{minted}[frame=single,framesep=3pt,breaklines=true,tabsize=2,linenos]{c++}
TestModule(Configuration& config, std::vector<std::shared_ptr<Detector>> detectors): Module(std::move(config), detectors) {}
\end{minted}

For detector and DUT modules, the first argument are the same, but the last argument is a \texttt{std::shared\_ptr} to the linked detector.
It should always forward this detector to the base class together with the configuration object.
Thus, the constructor of a detector module is:
\begin{minted}[frame=single,framesep=3pt,breaklines=true,tabsize=2,linenos]{c++}
TestModule(Configuration& config, std::shared_ptr<Detector> detector): Module(std::move(config), std::move(detector)) {}
\end{minted}

In addition to the constructor, each module can override the following methods:
\begin{itemize}
\item \parameter{initialise()}: Called after loading and constructing all modules and before starting the analysis loop.
This method can for example be used to initialize histograms.
233
\item \parameter{run(std::shared_ptr<Clipboard> clipboard)}: Called for every time frame or triggered event to be analyzed in the simulation. The argument represents a pointer to the clipboard where the event data is stored.
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
A status code is returned to signal the framework whether to continue processing data or to end the run.
\item \parameter{finalise()}: Called after processing all events in the run and before destructing the module.
Typically used to save the output data (like histograms).
Any exceptions should be thrown from here instead of the destructor.
\end{itemize}

\subsection{Module instantiation}
\label{sec:module_instantiation}
Modules are dynamically loaded and instantiated by the Module Manager.
They are constructed, initialized, executed and finalized in the linear order in which they are defined in the configuration file; for this reason the configuration file should follow the order of the real process.
For each section in the main configuration file (see~\ref{sec:config_parameters} for more details), a corresponding library is searched for which contains the module (the exception being the global framework section).
Module libraries are always named following the scheme \textbf{libCorryvreckanModule\texttt{ModuleName}}, reflecting the \texttt{ModuleName} configured via CMake.
The module search order is as follows:
\begin{enumerate}
\item Modules already loaded before from an earlier section header
\item All directories in the global configuration parameter \parameter{library_directories} in the provided order, if this parameter exists.
\item \item The internal library paths of the executable, that should automatically point to the libraries that are built and installed together with the executable.
These library paths are stored in \dir{RPATH} on Linux, see the next point for more information.
\item The other standard locations to search for libraries depending on the operating system.
Details about the procedure Linux follows can be found in~\cite{linuxld}.
\end{enumerate}

If the loading of the module library is successful, the module is checked to determine if it is a global, detector or DUT module.
As a single module may be called multiple times in the configuration, with overlapping requirements (such as a module which runs on all detectors of a given type, followed by the same module but with different parameters for one specific detector, also of this type) the Module Manager must establish which instantiations to keep and which to discard.
The instantiation logic determines a unique name and priority, where a lower number indicates a higher priority, for every instantiation.
The name and priority for the instantiation are determined differently for the two types of modules:
\begin{itemize}
\item \textbf{Global}: Name of the module, the priority is always \emph{high}.
\item \textbf{Detector/DUT}: Combination of the name of the module and the name of detector this module is executed for.
If the name of the detector is specified directly by the \parameter{name} parameter, the priority is \emph{high}.
If the detector is only matched by the \parameter{type} parameter, the priority is \emph{medium}.
If the \parameter{name} and \parameter{type} are both unspecified and the module is instantiated for all detectors, the priority is \emph{low}.
\end{itemize}
In the end, only a single instance for every unique name is allowed.
If there are multiple instantiations with the same unique name, the instantiation with the highest priority is kept.
If multiple instantiations with the same unique name and the same priority exist, an exception is raised.
270

271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
\section{Logging and Verbosity Levels}
\label{sec:logging_verbosity}
\corry is designed to identify mistakes and implementation errors as early as possible and to provide the user with clear indications about the problem.
The amount of feedback can be controlled using different log levels which are inclusive, i.e.\ lower levels also include messages from all higher levels.
The global log level can be set using the global parameter \parameter{log_level}.
The log level can be overridden for a specific module by adding the \parameter{log_level} parameter to the respective configuration section.
The following log levels are supported:
\begin{itemize}
\item \textbf{FATAL}: Indicates a fatal error that will lead to direct termination of the application.
Typically only emitted in the main executable after catching exceptions as they are the preferred way of fatal error handling.
An example of a fatal error is an invalid configuration parameter.
\item \textbf{STATUS}: Important information about the status of the reconstruction.
Is only used for messages which have to be logged in every run such as initial information on the module loading, opened data files and the current progress of the run.
\item \textbf{ERROR}: Severe error that should not occur during a normal well-configured reconstruction.
Frequently leads to a fatal error and can be used to provide extra information that may help in finding the problem (for example used to indicate the reason a dynamic library cannot be loaded).
\item \textbf{WARNING}: Indicate conditions that should not occur normally and possibly lead to unexpected results.
The framework will however continue without problems after a warning.
A warning is for example issued to indicate that a calibration file for a certain detector cannot be found and that the reconstruction is therefore performed with uncalibrated data.
\item \textbf{INFO}: Information messages about the reconstruction process.
Contains summaries of the reconstruction details for every event and for the overall run.
Should typically produce maximum one line of output per event and module.
\item \textbf{DEBUG}: In-depth details about the progress of the reconstruction such as information on every cluster formed or on track fitting results.
Produces large volumes of output per event, and should therefore only be used for debugging the reconstruction.
\item \textbf{TRACE}: Messages to trace what the framework or a module is currently doing.
Unlike the \textbf{DEBUG} level, it does not contain any direct information about the physics but rather indicates which part of the module or framework is currently running.
Mostly used for software debugging or determining performance bottlenecks in the simulations.
\end{itemize}

\begin{warning}
    It is not recommended to set the \parameter{log_level} higher than \textbf{WARNING} in a typical reconstruction as important messages may be missed.
    Setting too low logging levels should also be avoided since printing many log messages will significantly slow down the simulation.
\end{warning}

The logging system supports several formats for displaying the log messages.
The following formats are supported via the global parameter \parameter{log_format} or the individual module parameter with the same name:
\begin{itemize}
\item \textbf{SHORT}: Displays the data in a short form.
Includes only the first character of the log level followed by the configuration section header and the message.
\item \textbf{DEFAULT}: The default format.
Displays system time, log level, section header and the message itself.
\item \textbf{LONG}: Detailed logging format.
Displays all of the above but also indicates source code file and line where the log message was produced.
This can help in debugging modules.
\end{itemize}
315

Simon Spannagel's avatar
Simon Spannagel committed
316
\section{Coordinate Systems}
317
318
319
320
321
322
323
324
\label{sec:coordinate_systems}

Local coordinate systems for each detector and a global frame of reference for the full setup are defined.
The global coordinate system is chosen as a right-handed Cartesian system, and the rotations of individual devices are performed around the geometrical center of their sensor.

Local coordinate systems for the detectors are also right-handed Cartesian systems, with the x- and y-axes defining the sensor plane.
The origin of this coordinate system is the center of the lower left pixel in the grid, i.e.\ the pixel with indices (0,0).
This simplifies calculations in the local coordinate system as all positions can either be stated in absolute numbers or in fractions of the pixel pitch.