The Size of compressed data counter from HltPackedDataWriter fluctuates slightly between runs.
| Counter | # | sum | mean/eff^* | rms/err^* | min | max |#First run | "Size of compressed data" | 421 | 3736579 | 8875.5 | 10531.0 | 550.00 | 59912.0 |#Second run | "Size of compressed data" | 421 | 3736634 | 8875.6 | 10531.0 | 550.00 | 59926.0 |#Third run | "Size of compressed data" | 421 | 3736653 | 8875.7 | 10531.0 | 550.00 | 59928.0 |#Fourth run | "Size of compressed data" | 421 | 3736534 | 8875.4 | 10530.0 | 550.00 | 59919.0 |
This became apparent when creating references for the Moore test included by Moore!985 (merged). The numbers above were obtained running the test on 4 threads. Running it single-threaded also shows fluctuation:
#First Run | "Size of compressed data" | 421 | 3760293 | 8931.8 | 10499.0 | 548.00 | 64510.0 |#Second run | "Size of compressed data" | 421 | 3760430 | 8932.1 | 10500.0 | 548.00 | 64516.0 |#Third run | "Size of compressed data" | 421 | 3760210 | 8931.6 | 10499.0 | 548.00 | 64511.0 |
Edit: Enabling checksum in the HltPackedDataWriter shows that the input uncompressed data changes between runs. The logs for two runs are attached (log1.log, log2.log). The locations whose checksums change are
@lmeyerga@rmatev@sstahl What algorithm/library does HltPackedDataWriter used to do the compression ? If the compressed size is varying for the exact same input uncompressed buffer (do we know this is true btw?) then it might be a slight instability in the compression library itself.
Ah right, I understand now. Yes, it would be good to have some sort of check on the contents (maybe a checksum on the buffer) as it could have the same number of ints/floats and thus the same size, but not the same content and that then causes fluctuations in the compression level achieved.
Indeed, the contents of the input uncompressed buffer change. I've updated the description with the logs of two runs with checksum enabled in the HltPackedDataWriter.
OK thanks, so next we need to investigate what is changing in those data objects (and likely the originals they are formed from). Either way this is no longer really a 'persistency' issue but more a reconstruction/selection instability issue.
FYI -- there is a warning in the gcc11 build (see here), specifically:
In file included from ../Hlt/HltDAQ/src/component/HltPackedDataDecoder.cpp:33:9 In lambda function,10 inlined from 'constexpr _Res std::__invoke_impl(std::__invoke_other, _Fn&&, _Args&& ...) [with _Res = void; _Fn = LHCb::Hlt::PackedData::PackedDataChecksum::process<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>(std::string_view, const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&)::<lambda(const auto:74& ...)>; _Args = {char&, unsigned char&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&}]' at /cvmfs/sft.cern.ch/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/include/c++/11.1.0/bits/invoke.h:61:36,11 inlined from 'constexpr typename std::__invoke_result<_Functor, _ArgTypes>::type std::__invoke(_Callable&&, _Args&& ...) [with _Callable = LHCb::Hlt::PackedData::PackedDataChecksum::process<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>(std::string_view, const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&)::<lambda(const auto:74& ...)>; _Args = {char&, unsigned char&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&}]' at /cvmfs/sft.cern.ch/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/include/c++/11.1.0/bits/invoke.h:96:40,12 inlined from 'constexpr decltype(auto) std::__apply_impl(_Fn&&, _Tuple&&, std::index_sequence<_Idx ...>) [with _Fn = LHCb::Hlt::PackedData::PackedDataChecksum::process<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>(std::string_view, const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&)::<lambda(const auto:74& ...)>; _Tuple = const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&; long unsigned int ..._Idx = {0, 1, 2, 3, 4, 5}]' at /cvmfs/sft.cern.ch/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/include/c++/11.1.0/tuple:1806:27,13 inlined from 'constexpr decltype(auto) std::apply(_Fn&&, _Tuple&&) [with _Fn = LHCb::Hlt::PackedData::PackedDataChecksum::process<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>(std::string_view, const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&)::<lambda(const auto:74& ...)>; _Tuple = const std::tuple<char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&>&]' at /cvmfs/sft.cern.ch/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/include/c++/11.1.0/tuple:1817:31,14 inlined from 'void LHCb::Hlt::PackedData::PackedDataChecksum::process(std::string_view, const std::tuple<_Tps ...>&) [with T = {char&&, unsigned char&&, const std::vector<LHCb::PackedRecVertex, std::allocator<LHCb::PackedRecVertex> >&, const std::vector<long long int, std::allocator<long long int> >&, const std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >&, const std::vector<short int, std::allocator<short int> >&}]' at ../Hlt/HltDAQ/src/component/PackedDataChecksum.h:77:17,15 inlined from 'void LHCb::Hlt::PackedData::PackedDataChecksum::processObject(const T&, std::string_view) [with T = LHCb::PackedRecVertices]' at ../Hlt/HltDAQ/src/component/PackedDataChecksum.h:43:14:16 ../Hlt/HltDAQ/src/component/PackedDataChecksum.h:77:47: warning: '<anonymous>' may be used uninitialized [-Wmaybe-uninitialized]17 77 | std::apply( [&]( const auto&... arg ) { ( process( key, arg ), ... ); }, x );18 | ^19 ../Hlt/HltDAQ/src/component/PackedDataChecksum.h: In member function 'void LHCb::Hlt::PackedData::PackedDataChecksum::processObject(const T&, std::string_view) [with T = LHCb::PackedRecVertices]':20 ../Hlt/HltDAQ/src/component/PackedDataChecksum.h:63:10: note: by argument 3 of type 'const char&' to 'void LHCb::Hlt::PackedData::PackedDataChecksum::process(std::string_view, const T&) [with T = char]' declared here21 63 | void process( std::string_view key, const T& x ) {22 | ^~~~~~~23 In file included from ../Hlt/HltDAQ/src/component/HltPackedDataDecoder.cpp:23:24 ../Event/EventPacker/include/Event/PackedRecVertex.h:159:35: note: '<anonymous>' declared here25 159 | return std::forward_as_tuple( packingVersion(), version(), m_vect, m_refs, m_extra, m_weights );26 | ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which makes me wonder about the fact that forward_as_tuple does not (can not!) extend the lifetime of any temporary references -- but the 3rd and later argument to forward_as_tuple are solid const lvalue references 'anchored' in a class instance which definitely has sufficient lifetime, so that only leaves the first two arguments, but those are rvalues so the tuple returned from forward_as_tuple should have as type std::tuple<char,unsigned char,...> i.e. the first two are not references as all, but values, so their lifetime should not be a problem...
There's plenty of other stuff to be done, so no rush -- I just hope I remember to check once the nighly builds, as by that time I'll probably be trying something completely different ;-)
also, please note that this warning only affects the checksum, not the actual persisted data -- so at best these changes make the checksum repeatable, but it still does not affect the packed data itself... unless there is something else which also affects the checksum in the actual underlying data, in which case, it may look like these changes do not fix anything - except remove false positives, leaving only (hopefully) true positives.
@graven I repeated the test with your MR. The Size of compressed data counter still fluctuates, but most of the checksums no longer change. The only checksum that still changes is the one related to '/Event/HLT2/pPhys/Relations'.
OK, thanks for checking that -- that's good news for !3248 (merged), and we should go ahead and get it merged ASAP.
Now, unfortunately that still leaves open what / why is going on with /Event/HLT2/pPhys/Relations...
@lmeyerga is running the same job twice and comparing the stdout. The only difference with respect to the default is the output level and enabling the checksum. Or do you want a reference based test? This is easier to implement.
I would like whatever is considered easiest, and still tests that the checksums are reproducible ;-)
Having said that, I think that just enabling the output of the checksum and comparing to a reference will catch any problem with the checkum itself -- but it will also fail if the inputs change, and it thus also tests something else which is more likely to happen. So we should do at least that, and if the other (i.e. run twice, to test 'local reproducibility' as opposed to 'reproducability wrt. the previous incantation') is possible without breaking the bank, it would be very nice to have as well.
(basically: if you volunteer two options, the answer will often be: 'please do both' ;-)
last time I checked it was still fluctuating and counter is disabled in the tests. Before closing this, please remove the counter exception and test again.
FYI I think you'll see this if you look at the bandwidth in the UpgradeRateTest from 1 day to the next, where the rates don't change but the BWs fluctuate. The last 3 hlt2_2023_bandwidth jobs show you this behaviour. I have been asked to reduce the number of significant figures that we show the BW to on that page though, so this monitor might not live for too much longer :)
Tests that were disabled have been reenabled with Moore!3605 (merged) after stability fixes have been applied. Lets close this and reopen if they reappear.
The Size of compressed data counter from HltPackedDataWriter fluctuates slightly between runs
and the fixes introduced !4624 (merged) are explicitly to the checksums only -- which are (unfortunately) independent of the data actually written (compressed or not). So I don't see how that fix could address this issue...