Uncomplete event after running Moore succesfully

changed the description

Hi @pherrero, we have seen similar error messages when testing the mdf output produced after running hlt2 generic lines, which might be related:

Show error messages

HltPackedDataDe...  FATAL  Standard std::exception is caught
HltPackedDataDe...  ERROR PackedTracks packing version is not supported: 0
EventLoopMgr        FATAL .executeEvent(): Standard std::exception thrown by HltPackedDataDecoder
EventLoopMgr        ERROR PackedTracks packing version is not supported: 0
EventLoopMgr      WARNING Execution of algorithm HltPackedDataDecoder failed
UnpackTracks        ERROR Cannot retrieve /Event/HLT2/pRec/Track/Best from transient store.
UnpackTracks        ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackTracks failed
UnpackPVs           ERROR Cannot retrieve /Event/HLT2/pRec/Vertex/Primary from transient store.
UnpackPVs           ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackPVs failed
UnpackChargedPr...  ERROR Cannot retrieve /Event/HLT2/pRec/ProtoP/Charged from transient store.
UnpackChargedPr...  ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackChargedProtos failed
UnpackNeutralPr...  ERROR Cannot retrieve /Event/HLT2/pRec/ProtoP/Neutrals from transient store.
UnpackNeutralPr...  ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackNeutralProtos failed
UnpackParticles...  ERROR UnpackParticlesAndVertices:: Unpack('/Event/HLT2/pPhys/Relations'): missing destination '/Event/HLT2/Rec/Vertex/Primary', skip link StatusCode=FAILURE
EventLoopMgr        ERROR Error processing event loop.
EventLoopMgr        ERROR Terminating event processing loop due to errors
EventLoopMgr        ERROR Terminating event processing loop due to errors

To reproduce my error messages one just need to copy the following scripts (reproducer.py and test_output.py) to Moore folder and run the following reproducer.sh:

reproducer.py, test_output.py, reproducer.sh

I'm also posting the options and logs after the execution:

generic_lines.stdout, generic_lines.dump, test_filtered_generic_lines.log

These error messages are not reproducible in an old version of stack (e.g. from 21/10), so some MR merged between 21/10 and 10/11 (my current stack date) must have broken something...

@jonrob @sesen , do you know what is going on?

Are you running single or multi threaded ?

can you run in debug mode and post the outputs?

Main change that can cause this is I guess the updated functional (un)packers. @sponce can you take a look?

I would need a little bit more understanding of what is the context. Am I correct that :

you first run Moore, always on the same file, but potentially with different lines
you store out put to a DST
you run DaVinci on this DST
depending on the lines in Moore, DaVinci fails or not
the error mentions incomplete event, meaning some pieces are missing in the DST file ?

If that's all true, I agree with @sesen that functional (un)packers are a potential suspect. But in that case the first thing to do is to compare the 2 cases and identify the missing piece of data. We can then check whether it's present or not in the file and find out why it was not packed to the file, or not unpacked if it happens to be in the file.

The error posted by Paula was seen with the released Moore version, which was released before LHCb!3192 (merged) merged in. So I would expect it is not related to that.

okay, so the error from @pherrero is old which should be checked if it still happens with master.

And the one from @fsouzade is new.

HltPackedDataDe... ERROR PackedTracks packing version is not supported: 0

It's hard to tell if the file is saved badly or being read wrongly.

Ok, the error is a good start as it gives the failing packer. I'll try to have a look. I do not promise I manage today though

I tried with master and the error doesn't occur anymore. I only find it when I produce the dst using Moore v53r2.

I guess that closes this issue, whatever was the problem is fixed now.

Shall we click on "Close issue" then ?

No, see above the problem might still persist when writing an mdf.

what do you mean "might"? You mean the one reported by @fsouzade?

Yes, the test was performed on a master from 10/11 and the problem didn't exist in the released version, while for DST it seems to be the opposite. So before closing we should test on today's master for mdf output. @fsouzade is already on it.

I don't see that problem with that line anymore but there is a similar error I still get with the current master when using the JpsiKst line I attach.

I found this issue already some time ago when using Moore v52r2, and in that case, it corresponded to events that raised this error in the Moore log HltPackedDataWriter ERROR HltPackedDataWriter:: Packed objects too long to save. Even this error was raised, the Moore job continues and finishes successfully. This error is not raised anymore when running Moore with the current master but DaVinci still fails when arriving at that event with the message

HltPackedDataDe...  ERROR Failed to decompress HltPackedData.
HltPackedDataDe...  ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm HltPackedDataDecoder failed
UnpackTracks        ERROR Cannot retrieve /Event/HLT2/pRec/Track/Best from transient store.
UnpackTracks        ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackTracks failed
EventLoopMgr        ERROR Error processing event loop.
EventLoopMgr        ERROR Terminating event processing loop due to errors
EventLoopMgr        ERROR Terminating event processing loop due to errors

To reproduce moore_options_bd.py

DV_local.py

B2JpsiKst_loose.py

Even with an updated stack I also still see the same error messages I quoted before, some of them very similar to the ones @pherrero have. But in my case the problem seems related to the line Hlt2PhotonDecision, as these messages don't appear if I use another line for the turbo stream. Here is a simple reproducer:

reproducer2.py, test_output.py, reproducer2.sh

And the logs:

reproducer2.stdout, test_output_reproducer2.log

mentioned in merge request !1158 (merged)

changed the description

In would be helpful not to mixed up the different problems.

Test_output_reproducer2.log still says HltPackedDataDe... ERROR PackedTracks packing version is not supported: 0

@sponce promised to look into it. Running in debug mode or setting outputlevel to debug for packers might help.

Indeed I did. And the next message had been

I tried with master and the error doesn't occur anymore

followed by :

I guess that closes this issue, whatever was the problem is fixed now.

So should I still look at something or not ? If yes, which error and how can I reproduce it ? Sorry but I got a little bit lost

@sponce You can look at the comment above: #349 (comment 5007056)

My guess is something goes wrong when serializing PackedTracks with compression, then when HltPackedDataDecoder gets the wrong packing version from the header. If I am right, this can be easily seen in debug mode for HltPackedDataWriter.

So the point is indeed that the version found by the unpacker is 0 while it's expecting 4 or 5. Is that first point correct ? Do we indeed expect 4 or 5 ? I'll start to check the packer code but I would not be so surprised that it used to be 0 and just was not noticed so far, in other terms that the bug was there forever but only seen now that the code was cleaned up. Is there a way to check this hypothesis ?

And I have a first potential explanation, bug along the line of hidden error that appears now : could it be that this whole Pack/Unpack process was just void, that is there was nothing at all to pack/unpack ?

That might explain things because the old packers were not complaining is this case and doing nothing, so no data at all in the file. And the old Unpacker was similarly doing nothing and silently pretending it unpacked while there was nothing to do.

The new ones are doing their job. So the packer in creating an empty output, empty but existing, and the unpacker will expect it to exist, even if empty. So imagine somehow we still put nothing in the file for some reason in case of empty data, the new unpacker would exactly behave like what we see

Wait a minute, I've just noticed the previous message in the log, just before the unsupported version one :

TCK obtained from rawbank seems to be 0 -- blindly ASSUMING that the current HltANNSvc somehow has the same configuration as when the input data was written. Proceed at your own risk, good luck...

Well, if you get an empty bank and still try to read an object from it, there are indeed high chances things go wrong. And indeed the first byte read is version and will most probably be 0. Can you also check that part ?

Hi @sponce, thanks for looking into it. This message

TCK obtained from rawbank seems to be 0 -- blindly ASSUMING that the current HltANNSvc somehow has the same configuration as when the input data was written. Proceed at your own risk, good luck...

appears also if I run the reproducer with other lines, were no problem appears, as in the log test_output_reproducer2_hyperons.log

Running the same reproducer2.sh with the Hlt2PhotonLine with an old stack (from 21/10) I still get the same message, but the

error messages regarding PackedTracks

HltPackedDataDe...  FATAL  Standard std::exception is caught
HltPackedDataDe...  ERROR PackedTracks packing version is not supported: 0
EventLoopMgr        FATAL .executeEvent(): Standard std::exception thrown by HltPackedDataDecoder
EventLoopMgr        ERROR PackedTracks packing version is not supported: 0
EventLoopMgr      WARNING Execution of algorithm HltPackedDataDecoder failed
UnpackTracks        ERROR Cannot retrieve /Event/HLT2/pRec/Track/Best from transient store.
UnpackTracks        ERROR Maximum number of errors ( 'ErrorMax':1) reached.

are not there, as in the log: test_output_reproducer2_old_stack.log

There are other error messages which are the same, but the tests finishes "successfully" in the old version. So I guess you're right that the problem could be already there and the old packers/unpackers were just not complaining at all, making the test to finish "successfully".

In any case I also put Moore to run with HltPackedDataWriter in DEBUG mode as suggested by @sesen, and the logs are here:

reproducer2_old_stack.stdout, reproducer2.stdout

Making a diff of the two files gives diff_reproducer2.log, which shows some differences in the number of compressed bytes, in packed data checksum and in the lenght of raw banks with the same sourceID...

The message is usually a false alarm. At the moment the HltANNSvc is configured via a JSON file written when configuring the writing job and read when configuring the reading job. That's the "somehow".

Interesting. We seem to write less bytes, which would mean that the new packer writes less than the old. I was expecting the opposite actually : new one writing an empty object that old one was not writing.

Concerning the TCK message, it's still worrying. It would be better to understand it, as it can be at the source of the potentially unveiled problem (supposing it was there and unnoticed). I'll try to reproduce and add some print statements to see what happens.

@sstahl then let's fix/drop it please

Finally found the time to check this. And found the bug :-) LHCb!3328 (merged) should fix it. Can you please try ?

What happened is exactly what I supposed : we are in a completely degenerated case where packing and unpacking are called while they should not be. In such a case, we write an empty PackTrack to file. Now it needs a version or we fail unpacking it, even if it's empty... And the new version did not set the version in such a degenerated case.

We really need to clean up all this and call what we need to call properly, it's just a nightmare to maintain.

@fsouzade Have you tested Sebastian's solution?

@sponce For the MR I am working on, I removed all the default locations for un/packers so they will only be called when needed. It still needs some work to configure everything properly but it's almost there..

Hi @sponce, @sesen, sorry for the late response. Now with LHCb!3328 (merged) I don't see those messages anymore, and the test is 'successful' once again. Thank you! For the record, here I put the logs in case anyone wants to see the number of compressed bytes, checksum and length of raw banks with the same sourceID etc (and compare to the numbers from previous or future logs):

reproducer2_fix.stdout test_output_reproducer2_fix.log

mentioned in merge request LHCb!3328 (merged)

The original problem occurs again with the latest master, Moore runs without errors, and then DV fails because of an incomplete event with

HltPackedDataDe...  ERROR Failed to decompress HltPackedData.
HltPackedDataDe...  ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm HltPackedDataDecoder failed
UnpackTracks        ERROR Cannot retrieve /Event/HLT2/pRec/Track/Best from transient store.
UnpackTracks        ERROR Maximum number of errors ( 'ErrorMax':1) reached.
EventLoopMgr      WARNING Execution of algorithm UnpackTracks failed
EventLoopMgr        ERROR Error processing event loop.
EventLoopMgr        ERROR Terminating event processing loop due to errors
EventLoopMgr        ERROR Terminating event processing loop due to errors

I have run Moore and DaVinci in debug mode and the logs can be found here /afs/cern.ch/work/p/pherrero/public/logs.

To reproduce in DaVinci all files are here /afs/cern.ch/work/p/pherrero/public/files and for Moore /afs/cern.ch/work/p/pherrero/public/moore_options_bs.py with the line /afs/cern.ch/work/p/pherrero/public/Bs2JpsiPhi_loose.py

Are you sure you are on latest LHCb? I see EventLoopMgr DEBUG Creating Top Algorithm UnpackTrackFunctional with name UnpackTracks in your log file. UnpackTrackFunctional is now just called UnpackTrack.

That's because I was using the released version of DaVinci to be able to use TupleTools, I included the TuplesTools to DaVinci from the latest master and still fails, the new log in debug mode is again here /afs/cern.ch/work/p/pherrero/public/logs and the UnpackTrackFunctional is changed to UnpackTrack, but this was not the problem.

so first error is

HltPackedDataDe...  DEBUG Compression 'ZSTD', payload size 9475791
HltPackedDataDe...  ERROR Failed to decompress HltPackedData.

From a quick look, there can be 2 reasons for this error : https://gitlab.cern.ch/lhcb/LHCb/-/blob/master/Event/EventPacker/src/lib/PackedDataBuffer.cpp#L69 and https://gitlab.cern.ch/lhcb/LHCb/-/blob/master/Event/EventPacker/src/lib/PackedDataBuffer.cpp#L73, https://gitlab.cern.ch/lhcb/LHCb/-/blob/master/Event/EventPacker/src/lib/PackedDataBuffer.cpp#L79. That is respectively unable to unzip header, wrong header size or wrong data size. If you have a working setup, can you identify the case we have but puting a simple print statement there, including the different sizes involved ?

I have added the following prints

    int data_size   = 0;
    int output_size = 0;
    if ( R__unzip_header( &data_size, (unsigned char*)data.data(), &output_size ) ) {
      // invalid header...
      std::cout<<"unzip_header_false" <<std::endl;
      return false;
    }
    if ( static_cast<size_t>( data_size ) != data.size() ) {
      // recorded size is not the size we have been given
      std::cout<<"header_size_false" <<std::endl;
      return false;
    }
    std::cout<<"header_size : data_size " << static_cast<size_t>( data_size ) <<" data.size "<< data.size() <<std::endl;
    std::cout<<" output_size_1 "<< output_size <<std::endl;
    m_buffer.resize( output_size );
    int buffer_size = m_buffer.size();
    R__unzip( &data_size, (unsigned char*)data.data(), &buffer_size, (unsigned char*)m_buffer.data(), &output_size );
    std::cout<<" output_size_2 "<< output_size << " buffer size "<< m_buffer.size() <<std::endl;
    return static_cast<size_t>( output_size ) ==
           m_buffer.size(); // did we find exactly the amount of (uncompressed) data expected?
    if ( static_cast<size_t>( output_size ) != m_buffer.size()) {
      std::cout<<"data_size_false" <<std::endl;
    }
    std::cout<<"data_size" <<std::endl;
  }

} // namespace LHCb::Hlt::PackedData

this is what I see for the event that makes it fail

header_size : data_size 9475822 data.size 9475822
 output_size_1 4337628
 output_size_2 0 buffer size 4337628
HltPackedDataDe...  ERROR Failed to decompress HltPackedData.

none of the if mentioned before seem to be false

full log log_DV.txt

So clearly it's the unziping itself that fails. Let me have a quick look.

PS : note that the print statements after the return statements will never be reached

From the (ROOT) code, it looks like the reason for the decompression failure should be given on stderr. Now I do not think stderr was captured in yout log_DV.txt, can you check and tell me was you see there ?

Which stderr do you mean?

THE stderr when running the job. When capturing the output, you're probably doing something like > log_DV.txt in order to capture stdout, but if you want to capture stderr, you need > log_DV.txt 2>&1

the log hasn't changed by running with > log_DV.txt 2>&1log_DV_2.txt

Weird. In ROOT code, any return statement leading to 0 in output_size have a message to stderr. And you indeed activated it as we see diffs at the top of the file. I fear I'm stuck on this, I have to reproduce it locally and attach a debugger. Let me try

I could not reproduce as I can not read the input file from root://x509up_u133448@ccxrootdlhcb.in2p3.fr//pnfs/in2p3.fr/data/lhcb/LHCb-Disk/lhcb/MC/Upgrade/XDIGI/00146968/0000/00146968_00000290_1.xdigi I probably do not have proper credentials

You probably need to do lhcb-proxy-init?

Oh I see. Now I'm not sure I even have a valid grid certificate ! And here is what I get when trying :

>lhcb-proxy-init                                                                                                                                        [master.]Wed Nov 24 8:37:37
the directory /cvmfs/lhcb.cern.ch/lhcbdirac/versions/v10.3.2-x86_64-1637575535/etc/grid-security/vomsdir does not exist
zsh: exit 1     lhcb-proxy-init

Maybe you could copy the file to afs, with the others ?

Here it is /afs/cern.ch/work/p/pherrero/public/files/00146968_00000290_1.xdigi

Thanks for the file. I can now successfully run your Moore reproducer. But that's the problem : successfully ! I have no error with my setup. I'm running with head of master, so maybe the problem has been solved recently ?

Do you also don't get an error when running DaVinci? I also run Moore successfully.

with DaVinci, I get :

ImportError: cannot import name 'DecayTreeTuple' from 'Gaudi.Configurables' (unknown location)

I'm running

./DaVinci/build.x86_64_v2-centos7-gcc11-dbg/run gaudirun.py DV_options.py

Is that correct ?

This is a bit complicated as DecayTreeTuple was removed from the master recently. To use this, we tried to make MooreAnalysis as dependency of DaVinci. You could checkout peilian_dtt branches (same names) both in DaVinci and MooreAnalysis, then you would be able to run the DaVinci with DecayTreeTuple.

With these branches, I cannot even configure DaVinci. Somehow it cannot find MooreAnalysis although I have it locally and it compiled fine. So still not able to reproduce

werid, I just tested and it compiled for me successfully. Do you have the whole stack? (I assumed you do). The branch peilian_dtt in DaVinci only modifies two files, please see diff below: (maybe you could just modify your master DaVinci)

--- a/cmake/DaVinciDependencies.cmake
+++ b/cmake/DaVinciDependencies.cmake
@@ -23,6 +23,7 @@ endif()
 
 # -- Public dependencies
 lhcb_find_package(Analysis REQUIRED)
+lhcb_find_package(MooreAnalysis 1.1 REQUIRED)
 
 find_data_package(AppConfig 3.0 REQUIRED)
 find_data_package(FieldMap 5.0 REQUIRED)
diff --git a/lhcbproject.yml b/lhcbproject.yml
index 6fb1ec7a..48d2d711 100644
--- a/lhcbproject.yml
+++ b/lhcbproject.yml
@@ -3,3 +3,4 @@ name: DaVinci
 license: GPL-3.0-only
 dependencies:
   - Analysis
+  - MooreAnalysis

I've finally managed to compile. Not sure what was the problem... But now to the next issue :

ToolSvc             ERROR Cannot create tool TupleToolTrackInfo (No factory found)
TuplePhi            FATAL DecayTreeTuple:: Exception throw: tool():: Could not retrieve Tool 'TupleToolTrackInfo' StatusCode=FAILURE
TuplePhi            FATAL  Exception with tag= is caught
TuplePhi            ERROR  	 TuplePhi:: tool():: Could not retrieve Tool 'TupleToolTrackInfo'	 StatusCode=FAILURE
EventLoopMgr        FATAL .executeEvent(): Exception with tag= thrown by TuplePhi
EventLoopMgr        ERROR  	 TuplePhi:: tool():: Could not retrieve Tool 'TupleToolTrackInfo'	 StatusCode=FAILURE

Just comment it out in the three places it appears in the DaVinci script, lines 99, 124 and 135. It needs also to be added to master, but it's not relevant for this problem.

So now I have a "successful" Davinci run, with zillions of warnings, but no Error. output See attached file

Do you also have the Moore log? Is the event 740 or 741 from the xdigi file the one that arises the error for me

Thanks ! That was the last issue. When trying to have Moore working I did change the number of events so that it was faster and did not have the problematic one. pfff... I hope the debugging will be faster that reproducing it !

I did not have time to completely solve this puzzle, but I've made good progress and maybe you'll be able to give me some helpful context.

What I found is that the problem is in the packing step, as expected, but more precisely in the zipping itself. To make a long story short, we are trying to zip a piece of data that is bigger than 0xffffff bytes (that is 16.7 M roughly) with an algo (ZSTD?) which seems to only support that maximum size. The net result is that the size of the original data is truncated in the compressed stream, preventing the unzipping to work properly.

Does that ring any bell ? It seems that there are at least 2 flows here : we do not check the size of what we send to ROOT for compression, and ROOT does not check either ! Now I'm surprised we discover this only now. Something must have changed somewhere, although I doubt it's related to the packer changes.

I'll dig further tomorrow morning but any input is welcome

We do saw similar problem long time ago, which is the case mentioned here and was reported in Upgrade HLT2 as well (unfortunately we were only told to check our selection to reduce the data size, no one looked into the real issue). In which case, Moore gave the ERROR "HltPackedDataWriter ERROR HltPackedDataWriter:: Packed objects too long to save." but didn't stop the process. Then when running the DV, it gave error about empty data. What I guessed was that the data was too big to be persisted due to maximum size limitation, so it persisted nothing but didn't skip that event, then it broke in the process of DV due to empty event.

The compression method was changed to ZSTD in LHCb!3191 (merged) which was merged 3 months ago. Maybe you can try the one used before.

Did you find documentation which specifies this maximum size or is this empirical? Do we know if there is an option to change that maximum size?

So the problem is not in ZSTD itself, but in the wrapping that ROOT has on top of it. See code around line 48-58 in https://github.com/root-project/root/blob/master/core/zstd/src/ZipZSTD.cxx. You'll see that ROOT adds a header of 9 bytes, namely ZS\1 as a magic number followed by 3 bytes for the deflated size and 3 bytes for the inflated size. There is no check that the original size (given as an int so 4 bytes long) fits... And we do not check either ! That's it.

And the bug was introduced by LHCb!3191 (merged) indeed, as we were previously using LZMA and you can check the lines 40-42 in https://github.com/root-project/root/blob/master/core/lzma/src/ZipLZMA.c :

 if (*srcsize > 0xffffff || *srcsize < 0) {
      return;
   }

So LZMA will return a proper error at compression time. I'll report to ROOT people. Now remains the question of what we want to do for too big events

for reference : https://github.com/root-project/root/issues/9334

"Now remains the question of what we want to do for too big events"

'Too Big events' will still need to be saved. No matter what the limit is, the real experiment will always find a way to exceed that. So I guess if his happens we will need to break the data up into smaller chunks that the zipper can handle, and handle each independently...

@sponce is the buffer size limit (0xffffff) a fundamental limitation of the underlaying compression algorithms used (ZSTD,LZMA) or just an artificial limitation imposed by ROOT just because it passes buffer size as a int rather than something else with a bigger range (say unsigned long long or std::size_t) ? If the later then I think we should also push for a fix on the ROOT side, to remove (or extend) the limit as having the current one forced on us when it not a real limitation of the compression algorithm is non-ideal.

Just to answer the original question of @jonrob

is the buffer size limit (0xffffff) a fundamental limitation of the underlaying compression algorithms used (ZSTD,LZMA) or just an artificial limitation imposed by ROOT ?

It's rather an artificial limitation due to ROOT. Or at least, ROOT limits it to 24 bits (so ~16MB) when the underlying algo goes up to 32 bits (~4GB) and would sort out our case. The problem is in the internal format ROOT creates in the byte stream. you can see the code here : https://github.com/root-project/root/blob/master/core/zstd/src/ZipZSTD.cxx#L50 :

    tgt[0] = 'Z';
    tgt[1] = 'S';
    tgt[2] = '\1';
    tgt[3] = deflate_size & 0xff;
    tgt[4] = (deflate_size >> 8) & 0xff;
    tgt[5] = (deflate_size >> 16) & 0xff;
    tgt[6] = inflate_size & 0xff;
    tgt[7] = (inflate_size >> 8) & 0xff;
    tgt[8] = (inflate_size >> 16) & 0xff;

I suppose they could create a "v2" by putting a \2 in the 3rd byte and extending the sizes to 4 bytes. To be discussed with them.

ok so ROOT people do not want to change their protocol as they want to be forward compatible. A recipe for disaster in my opinion as you have to deal with the accumulated set of mistakes of the past. But anyway they won't help here.

I see 2 possibilities :

we give up and live with an error being correctly thrown in the future
we fix the protocol ourselves by recoding the ROOT method we are using. It's only a few lines and can be backward compatible (but not forward, that is old code cannot read new files when big events are present, or at least not these big events)

Just tell me if it's worth investing a few hours in the second option.

In my opinion we have already concluded we cannot live with this limitation for PbPb datataking even for purely reconstructed quantities (i.e. without selections and combinatorics) and therefore we need to fix it. I don't have an opinion whether your proposed fix @sponce is viable long-term maybe @jonrob @graven can comment.

right, so the second option is the way to go (and we don't care about forward compatibility)

ok, noted. I'll take care of it as soon as I can.

LHCb!3360 (merged)

Not yet tested (already compiles though), but you may want to already have a look to check whether it makes sense at all to do it like this.

Now tested, and with a unit test added. Feel free to try it.

coming back to @peilian 's observation:

Moore gave the ERROR "HltPackedDataWriter ERROR HltPackedDataWriter:: Packed objects too long to save." but didn't stop the process. Then when running the DV, it gave error about empty data. What I guessed was that the data was too big to be persisted due to maximum size limitation, so it persisted nothing but didn't skip that event, then it broke in the process of DV due to empty event

The above is correct -- the code in question see here does:

void Writer::addBanks( RawEvent& rawEvent, span<const uint8_t> data, Compression compression ) const {
    /// Maximum bank payload size = 65535 (max uint16) - 8 (header) - 3 (alignment)
    constexpr size_t MAX_PAYLOAD_SIZE{65524};

    uint16_t sourceIDCommon = shift<SourceIDMasks::Compression>( static_cast<uint16_t>( compression ) );

    auto chunks = range::chunk( data, MAX_PAYLOAD_SIZE );
    if ( chunks.size() > extract<SourceIDMasks::PartID>( ~uint16_t{0} ) ) {
      Error( "Packed objects too large to save", StatusCode::SUCCESS, 50 ).ignore();
      return;
    }
    if ( msgLevel( MSG::DEBUG ) ) { debug() << "Writing " << chunks.size() << " banks" << endmsg; }
    for ( auto [ibank, chunk] : range::enumerate( chunks ) ) {
      uint16_t sourceID = sourceIDCommon | shift<SourceIDMasks::PartID>( ibank );
      if ( msgLevel( MSG::DEBUG ) ) {
        debug() << "Adding raw bank with sourceID=" << sourceID << ", length=" << chunk.size()
                << ", offset=" << ( chunk.data() - data.data() ) << endmsg;
      }
      rawEvent.addBank( sourceID, RawBank::DstData, kVersionNumber, chunk );
    }

i.e. indeed, as the size becomes larger than approx 255*64K = 16M (as SourceIDMasks::PartID = 0x00FF currently) , it aborts without persisting anything. So indeed currently, this results in an output which just has 'no raw bank' of type DstData. It would be better if, in case of error, instead of writing nothing, it explicitly would write some 'error status' so that the downstream code could keep track on what happened 'upstream' without second guessing -- i.e. just a trivial bank of a dedicated type flagging just that it failed because it attempted to write X bytes of data...

Note that we could assign more bits to PartID as there are reserved bits, but do we really want try and write such huge events? It would be best (as mentioned above) to 'flag' these events as 'too much DstData' and then only pass them on (rate limited...) with the detector data received as input and not with the DstData added... (as presumably one would be able to 'replay' the Hlt on the input data and recreate the problem in order to debug it)

Uncomplete event after running Moore succesfully

Designs

Child items ...

Activity