Skip to content

Reformat unpacker structure, write parquet files directly from c++

Rocco Ardino requested to merge reformat-parquet into master

The aim of this MR is to avoid memory explosion due to python when running the unpacker. Output unpacked files are directly written into parquet format from c++, and they can be easily analyzed later from spark on swan.

The only part that this MR overwrites is the CMakeList.txt file (which is not used by the current unpacker). The project has been reorganized in header files to take care of all masks, shifts, block structures and raw file converters. The main source file (compile with cmake + make) is scunpack.cc, where a cmd line parser has been also added. Few notes:

  • at the moment, the application does not run as a service
  • it requires arrow and parquet libraries (installation guide here) and has been tested for Centos Stream 8 on pcgpu-c2f07-18-01
  • run with
cd scouting-unpacker/
mkdir build
cmake -H../src/ -B.
make
./scunpack -i INPUT_DAT_FILE -b BLOCKS_TO_UNPACK -r ROW_PER_CHUNK_IN_PARQUET
Edited by Rocco Ardino

Merge request reports