Review the 'EventPacker' codebase, in light of the general simplification of the primary event model
During Run 2 there an infrastructure was developed, generally referred to as the 'packing' that did a number of things aimed at reducing the persistent size of the data.
-
Event model class as used in Run 2 tended to have a lot of complex structure, e.g. inheritance, dynamically allocated storage containers, KeyObject Containers etc. Whilst ROOT could persist these objects these complexities generally lead to the persistent size being quite large. The packing first conveyed the data to more basic data structures, removing lot of these details. (See Event/EventPacker in LHCb).
-
Floating point values where generally truncated, removing information below what was deemed the precision of those values in real data. This was done using a common trick which is, for each value
- a) scale the floating point value by some factor
- b) cast the floating point value to int, which removes all the decimal places.
- c) cast the result back to float, and divide by the same fact as in a)
The end result is, depending on the value of the scale factor in a) is a floating point value truncated to some given precision.
In the packing, steps a) and b) happen during the writing stage. The value that is then persisted in the packed data representations is then the int produced by these steps. This is done as in this form the int generally has a lot of padding zeros in its bits, so is easiest for compression in the file writing to squeeze the data size on disk.
There are a number of things to be reviewed for Run 3
-
The code based is heavily based around the old Run 2 event model. In Run3 the event model is being re-designed and generally speaking getting simpler, so closer to the persistent packed form. It should be reviewed what this means in terms of the code base.
-
We want to try and move the data truncation done in 2) above to the data provide algorithms (see #152). It needs to be seen if this can be done in a way that means the persistent size of the floating point value is as good as the int representation discussed above. One possible way to do this is to instead of truncating to a fixed decimal place precision, perform a 'relative precision' truncation where instead of the procedure above, the lower 'noise' bits in the mantissa of the floating point representation are set to zero. By doing it this way the actual floating point representation has padding zeros and thus will compress better. See #152 for more details on this.