RecJobTransforms + SimuJobTransforms: Switch the compression of all temporary files to ZLIB to speed up reading/writing
Compare changes
Files
4- amete authored
@@ -154,10 +154,10 @@ if hasattr(runArgs,"outputDESDM_BEAMSPOTFile"):
As we're discussing in ATEAM-656, this MR changes the compression algorithm for all temporary files to ZLIB
. In this context, a file is temporary if either of the following two criteria is met:
--outputXYZFile
specified for the intermediate step(s), e.g. RAWtoESD
followed by ESDtoAOD
without specifying --outputESDFile
(this is not being done)
AthenaMP
(this is already being done)
In the first case, the output filename is set to be tmp.XYZ
where XYZ
stands for the appropriate step, while in the second case _000
is appended to the file name, both by convention.
From a quick test based on q431
w/ 50 events, here is the comparison of StreamESD
performance, as well as resulting ESD
file sizes, by different compression schemes (compression level is always set to 1):
Compression | File Size [MB] | CPU-time [sec/evt] | Note |
---|---|---|---|
LZMA | 139 MB | 855 | Leading CPU consumer |
ZLIB | 180 MB | 371 | 4th leading CPU consumer |
ZSTD | 181 MB | 287 | 4th leading CPU consumer |
LZ4 | 245 MB | 221 | 4th leading CPU consumer |
Again, we're not proposing to change the compression scheme for permanent files (which is LZMA
for all upstream formats including AODs
and - at least for the time being - ZLIB
for DAODs
), only for the temporary ones. Going from ZLIB
to LZ4
would increase the file size by about 35% while improving the StreamESD
CPU performance by 40%. In all three cases, ZLIB
, ZSTD
, and LZ4
, the ESDtoAOD
performances are practically the same in this test.
This should especially help w/ high thread count AthenaMT
jobs w/ chained workflows where the temporary intermediate files are currently being compressed w/ LZMA
, which is very expensive.