overflow in unpacking busy PbPb events
Unpacking protparticles from busy events from PbPb collisions give a segmentation fault.
Nicole Skidmore produced a mininal script to reproduce the error
DSTexplore.py
Can be launch using an hlt2 reco dst output from PbPb simulation with a GEC of 50000 (~objective is 80000 for the PbPb data-taking).
Launch as this inside Moore:
./run python DSTminimalexplore.py -i /afs/cern.ch/work/s/sbelin/public/forNicole/hlt2_PbPb_dimuon_newseq_gec50000.dst -t /afs/cern.ch/work/s/sbelin/public/forNicole/hlt2_PbPb_dimuon_newseq_gec50000.tck.json >& log &
From Nicole: The event that appears to cause the segfault is event 15
EventSelector SUCCESS Reading Event record 15. Record number within stream 1: 15
LHCb::UnpackRaw... DEBUG start unpacking raw events
LHCb::UnpackRaw... DEBUG looking for bank types of 'BankTypes':[ 'ODIN' , 'DstData' , 'HltDecReports' ]
LHCb::UnpackRaw... DEBUG found 1 banks
LHCb::UnpackRaw... DEBUG found 19 banks
LHCb::UnpackRaw... DEBUG found 1 banks
19 is a lot of DSTData banks! The ProtoParticles
in particular
ProtoParticleUn... DEBUG Unpacking ProtoParticles from '/Event/HLT2/pRec/ProtoP/Charged' m: 4 o: ProtoParticleUnpacker opt: 0 to '/Event/HLT2/Rec/ProtoP/Charged' m: 16 o: ProtoParticleUnpacker opt: 0
ProtoParticleUn... DEBUG Loading of object (CLID=1552 locationID=282) consumed 603606 bytes, and 5 links were stored!
ProtoParticleUn... DEBUG version 2
ProtoParticleUn... DEBUG packing version 1
From @sesen and @graven discussion on the mattermost channel Hlt2 Upgrade:
The number of extra info fields for a single container of protoparticles that can be stored must fit in a 16 bit integer. In case of PbPb this limit was broken and no check was done to avoid the crash.
@graven proposed:
1. in the unpacking, make sure that last is larger than first -- if not, abandon unpacking the extra info. This should fix the crash
2. in the packing, recognize overflow, and refuse to pack too many values, abandoning the too-many-values
Those two are just a stopgap to make sure the system 'as is' is aware of its limitations.
and then the there is a choice between 3a. and 3b.
3a. update the version of the packed protoparticle so that it can store more than 64K extra info
3b. avoid using extrainfo, and store the relevant information as individual relations
3c : it may be possible to do something smart in the unpacking, and recognize the point at which the overflow occurs, and then try to correct for it...
3b is the proper strategic answer, but is more invasive and will take more work and time to implement.
3a would be a reasonable quick hack that will increase the event size. But fortunately, we don't need both first and last to be 32 bits, but we should store offset + size, in which case only the offset needs to be eg. 32 bit, and the size can stay 16 bit as who would ever need more than 64K extra info per protoparticle
He also implement a simple fix to help the debugging: !3567 (merged)