Skip to content
Snippets Groups Projects
Commit 817ab150 authored by Tadej Novak's avatar Tadej Novak
Browse files

Merge branch 'devtutorial_persistency' into 'main'

Migrating developers' tutorial to public pages: part 5 (persistency)

See merge request !96
parents fa1ca8ca e83d7123
No related branches found
No related tags found
1 merge request!96Migrating developers' tutorial to public pages: part 5 (persistency)
Pipeline #11724081 passed with warnings
# Athena Persistency Example
This is a simple exercise showing how to introduce a very basic persistency into an Athena job. It builds on an existing transient Athena example that was used in the ATLAS Software Tutorial and adds writing functionality to it, so that the transient objects used in that example are stored in an output file, for all processed events.
This exercise shows how to:
* create a class dictionary (for an existing Athena class)
* trigger an automatic creation of an AthenaPool converter for this class
* modify job options to produce an output file
* inspect produced file with PyROOT
## Prerequisites
We will re-use the same `build` directory and `athena` clone as used elsewhere in this tutorial.
## Setup runtime
If you've started a fresh terminal/ssh session, remember to setup the Athena development environment for the latest nightly build of the Athena main branch:
```bash
setupATLAS
asetup main,Athena,latest
```
## Make a new branch
Now `cd` to the `athena` source directory and make a new branch:
```bash
cd athena
git fetch upstream
git checkout -b persistency-exercise upstream/main --no-track
```
## Create the new package
By convention, persistency support for classes defined in a certain package (named Package) goes to a new package with a name `PackageAthenaPool`. We will implement persistency for `AthExHive` so let's create a package `AthExHiveAthenaPool`.
From the `athena` directory do the following:
```bash
# Make the package directory:
mkdir AthExHiveAthenaPool
cd AthExHiveAthenaPool
# Make the headers and python subdirectories
mkdir AthExHiveAthenaPool python
```
Then create a `CMakeLists.txt` file in the package root directory with just the package name:
```cmake
# Declare the package name
atlas_subdir( AthExHiveAthenaPool )
```
* CMake in the context of ATLAS is covered in [this tutorial](/{{locations.athena_developers}}/tutorial/cmake.md).
## Add class dictionary
All classes for which instances are to be persistified require dictionaries. Dictionaries are created by the build system and added to the release.
First let's create the `selection.xml` file - a file in the XML format listing which types should be included in a given dictionary. This file usually resides in the include directory of a package (here that would be: `AthExHiveAthenaPool/selection.xml`).
We want a dictionary for the `HiveDataObj` class defined and used in `AthExHive` example, so the file should contain the following lines:
```bash
<lcgdict>
<class name="HiveDataObj" id="8AF5C571-6D5E-46A7-918F-145FB3AA2C43"/>
</lcgdict>
```
Note: all types that are to be persistified on their own (that is, not as part of another object) should be given an identifier - as show in the above example. It is an XML attribute with the name *id* and a value being a unique identifier - also called Global Unique Identifier (GUID). This identifier can be generated randomly with the standard Unix command `uuidgen` but should never be changed once it was used to write into a file (as long as we want to be able to read it back later on).
Next we will create a C++ header file that will combine all C++ includes for the types that should have dictionaries. Let's call it according to the convention `AthExHiveAthenaPoolDict.h` and also put it into the package include directory (here `AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h`).
```c++
#include "AthExHive/HiveDataObj.h"
```
Finally we need to add a CMake command to the `CMakeLists.txt` file in the package root directory that so far was mostly empty. This is the command to create a dictionary using the `selection.xml` and `AthExHiveAthenaPoolDict.h` files as input:
```cmake
atlas_add_dictionary( AthExHiveAthenaPoolDict
AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h
AthExHiveAthenaPool/selection.xml
LINK_LIBRARIES AthExHiveLib )
```
`AthExHiveAthenaPoolDict` is the name of the dictionary that will be created.
At this point we should be already able to build the dictionary. Go to your working directory (the one where the `build` and `run` subdirectories were created earlier. Edit the `package_filters.txt` file (or whatever you called it) there, with the newly created package listed (the last line says: "do not build other packages from the release"):
```
+ AthExHiveAthenaPool
- .*
```
Enter the `build` directory and execute the following cmake command, and if everything goes well source the environment setup.sh and execute `make`.
```bash
cd build
cmake -DATLAS_PACKAGE_FILTER_FILE=../package_filters.txt ../athena/Projects/WorkDir
source $LCG_PLATFORM/setup.sh
make
```
After `make` finishes, the dictionary should be ready and usable (thanks to the environment setting added by the sourced `setup.sh` script). It can also be used by PyROOT, so it should be possible to perform the following quick verification from the command line (you may remember that `HiveDataObj` wraps a single integer and provides the `val()` accessor to it):
```python
build% python
>>> import ROOT
>>> obj = ROOT.HiveDataObj(123)
>>> obj.val()
123
```
You can inspect the results of dictionary building in the `lib` subdirectory in your build location:
```bash
build% ls -l $LCG_PLATFORM/lib
-rw-r--r--. 1 mnowak zp 110 Nov 16 19:26 WorkDir.rootmap
-rwxr-xr-x. 1 mnowak zp 29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp 146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp 1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm
build% cat $LCG_PLATFORM/lib/WorkDir.rootmap
[ libAthExHiveAthenaPoolDict.so ]
# List of selected classes
class HiveDataObj
header AthExHive/HiveDataObj.h
```
What is shown here is the `rootmap` file, which tells the runtime dictionary discovery system which library contains the dictionary for the `HiveDataObj` class. The dynamic library contains, among others, class factory functions, and the `.PCM` file contains C++ reflection information about those classes.
## Add AthenaPool converter
As mentioned in the tutorial, ATLAS CMake provides a command to build AthenaPool converters for a particular class. The following lines should be added to the `CMakeLists.txt` file:
```cmake
atlas_add_poolcnv_library( AthExHiveAthenaPoolCnv
FILES AthExHive/HiveDataObj.h
LINK_LIBRARIES AthExHiveLib AthenaPoolCnvSvcLib )
```
For this exercise we will let CMake create the converter automatically - CMake will do that be default if it can't find converter source files in a well known location.
After adding the `atlas_add_poolcnv_library` command to the `CMakeLists.txt` file go back to the `build` directory and execute `make` again.
Afterwards, a quick look into the `lib` directory shows new files:
```bash
build% ls -l $LCG_PLATFORM/lib
-rw-r--r--. 1 mnowak zp 50 Nov 16 20:02 WorkDir.components
-rw-r--r--. 1 mnowak zp 110 Nov 16 19:26 WorkDir.rootmap
-rw-r--r--. 1 mnowak zp 50 Nov 16 20:02 libAthExHiveAthenaPoolCnv.components
-rwxr-xr-x. 1 mnowak zp 132064 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so
-rwxr-xr-x. 1 mnowak zp 1236032 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so.dbg
-rwxr-xr-x. 1 mnowak zp 29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp 146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp 1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm
build% cat $LCG_PLATFORM/lib/WorkDir.components
v2::libAthExHiveAthenaPoolCnv.so:CNV_256_37539154
```
Here we can see the components manifest file which lists all components that can be dynamically loaded by Athena at runtime, as needed. In particular, there is a converter for a class with `CLID=37539154` that can be found in `libAthExHiveAthenaPoolCnv` shared library. The CLID assignment can be found in the class header files - [37539154 is assigned to HiveDataObj]({{data.athena_git_url}}/-/blob/main/Control/AthenaExamples/AthExHive/AthExHive/HiveDataObj.h#L42).
## Prepare job options for writing
Job config files that are to be executed at runtime must be stored in the `python` package subdirectory. We have created this subdirectory in the beginning - now let's add the job config there by copying the [WriteHiveDataObjConfig.py]({{data.athena_git_url}}/-/blob/main/Control/AthenaExamples/AthExHiveAthenaPool/python/WriteHiveDataObjConfig.py) file into the `python/` directory:
```python
--8<-- "{{data.athena_git_url}}/-/raw/main/Control/AthenaExamples/AthExHiveAthenaPool/python/WriteHiveDataObjConfig.py"
```
To make the config available at runtime, we need to instruct CMake to install them in the release. This can be done by adding the following command to the `CMakeLists.txt` file:
```cmake
atlas_install_python_modules( python/*.py )
```
Go to the `build` directory and execute `make` once more. After it finishes, we should be finally ready to execute the Athena job.
## Execute the Athena job to create an output file
Run the Athena job with the new job config. Do it from the `run` directory so we can clearly see the output:
```bash
cd ../run
python -m AthExHiveAthenaPool.WriteHiveDataObjConfig
```
## Inspect the results
The directory should contain 2 files produced by the job: `myExampleStream.pool.root` and `PoolFileCatalog.xml`. You can check the content of the XML file catalog, but the ROOT file is a bit more complicated. It can be opened directly with ROOT and browsed (hint - Athena data is stored in "CollectionTree" `TTree`). But for the purpose of this exercise we may take advantage of PyROOT.
Save the [ReadHiveDataObjs.py]({{data.athena_git_url}}/-/blob/main/Control/AthenaExamples/AthExHiveAthenaPool/python/ReadHiveDataObjs.py) PyROOT script to the `run` directory:
```python
--8<-- "{{data.athena_git_url}}/-/raw/main/Control/AthenaExamples/AthExHiveAthenaPool/python/ReadHiveDataObjs.py"
```
and execute it using `python` to see a printout of the data from the ROOT file created earlier.
## Bonus: adding in-file metadata
While writing out event data, we are usually also interested in the metadata describing those events and the produced output file. We can achieve it by adding relevant configuration to the job options. In this exercise, we will add two helper tools which will create EventFormat and FileMetaData objects and persist them in the output file.
You can achieve it by using the configuration from [WriteHiveWithMetaData.py]({{data.athena_git_url}}/-/blob/main/Control/AthenaExamples/AthExHiveAthenaPool/python/WriteHiveWithMetaData.py) (note the additions to the configuration used previously in this exercise):
```python
--8<-- "{{data.athena_git_url}}/-/raw/main/Control/AthenaExamples/AthExHiveAthenaPool/python/WriteHiveWithMetaData.py"
```
To inspect the metadata in the output file, you can use the following command:
```bash
meta-reader -m full TestStream.pool.root
```
......@@ -89,6 +89,7 @@ nav:
- athena/developers/tutorial/testing_debugging.md
- athena/developers/tutorial/conditions.md
- athena/developers/tutorial/performance.md
- athena/developers/tutorial/persistency.md
- Git Workflow:
- athena/git/index.md
- athena/git/workflow-quick.md
......@@ -197,6 +198,9 @@ markdown_extensions:
- pymdownx.details
- pymdownx.superfences # code inside admonitions
- pymdownx.emoji
- pymdownx.snippets:
url_download: true
check_paths: true
plugins:
- search
......
......@@ -350,4 +350,7 @@ VtuneAmplifier
ROIs
customisable
templated
allocator
\ No newline at end of file
allocator
persistify
persistified
Unix
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment