Unpacker design
Summary
First of all, perhaps we should consider it as a bit larger thing then just unpacker
... Rather a gemdatahandler
... But for now, let's go to the common jargon
Scope of the unpacker:
- Convert the binary raw data to a ROOT tree which is used subsequently by analysis software
- Why an additional conversion step?
- ROOT provides pretty good compression algorithms, storing raw data in ROOT format reduces significantly required disc space
- I think we should consider deletion of the binary raw files post unpacking in case all the data healthiness checks are passed - see below
- We can consider alternative models for managing and storing data, e.g. HDF5, however we should be careful in accounting for development resources required and operational/developmental convenience
- ROOT provides a simple GUI to have a quick look over the unpacked data in human-understandable way
- ROOT is the standard data format for HEP experiments
- ROOT provides pretty good compression algorithms, storing raw data in ROOT format reduces significantly required disc space
- During this step the events will be checked for structural correctness
- Optionally an event recovery maybe launched in case the event has corrupted structure
- Actually apart from obvious bugs, such behavior has never been observed, so I would consider this as a low-priority functionality
- Optionally an event recovery maybe launched in case the event has corrupted structure
- Why an additional conversion step?
- Unpacker should support a variety of data sources
- FedKit readout
- AMC13 SDRAM readout
- Future hardware support has to be foreseen
- Direct readout from APX cards
- Direct readout from PCIe
- DTH readout
- And perhaps some more...
- Unpacker should support a number of GEM data formats
- Tracking data
- Lossless data format
- Various options of zero-suppressed data
- Trigger data
- This is up to discussion, however I think the project will benefit from such support
- Non-tracking calibration data
- Another option to discuss
- we may want to store the non-tracking calibration data already in ROOT or HDF5 or other format
- however this will bring additional potentially unwanted dependency in the core software
- Another option to discuss
- Tracking data
- Unpacker should also provide a packing/simulation mechanism for raw data generation
- Useful for variety of simulation studies and testing
- Unpacker should be provided as a standalone module/library and should require minimal input (e.g. it should be agnostic of hardware configuration)
- input data
- DAQ source type (optional)
- Data format type (optional)
As it was mentioned before in various discussions, the performance is not critical here (the readout bandwidth provides hard limitation on possible data rate, so we never going to have too much data to unpack/process. Another consideration is how the unpacker will be coupled with the downstream data analysis/processing tools. Given how bad is ROOT
in multiproc/multicore applications (it is possible, but it brings additional dependencies, documentation is lacking at times, the load distribution is not optimal for our applications, management of installation and updates is cumbersome etc...), and a very nice tools for data analysis available in python, I would propose to have the analysis tools to be written in python (3.6+ to ensure long time maintainability) using the uproot
I/O library for root files. This naturally leads to provide the unpacker as a python package. Having the whole repository using the same language simplifies the building and distribution of the package, requirements to developers (well, not everyone has to be(come) a C++
expert) and provides a nice homogeneous development environment. On top of that, python is in general a bit more user-friendly and contains all nuts a bits for seamless integration with almost any kind of other fullstack components (DBs, web GUIs, name your own).
Related to #2
What is the expected correct behavior?
We need to decide ASAP on the design key points:
- Sign off the
unpacker
scope- Perhaps rename it to
gemdatahandler
? If you have other proposals, please comment.
- Perhaps rename it to
- Compressed storage and management data model
- Proposal:
ROOT
trees. If you want to propose another option, please provide the full use-case solutions according to needs mentioned above
- Proposal:
- Unpacker language:
C/C++
vspython3
- Data formats support by unpacker (tracking data for sure, what about trigger and non-tracking calibration data?)