Error in fit for one cell, one event - 2024-e session
In event number 10522-1-14490 (run 10522, spill 1, event-in-sipll 14490) I observe that the fit algorithm fails if the following cell is enabled for fit:
ECAL0-3-3
- If all cells but this are enabled for fit, the code works properly (tested using
print-reco-event.exe
) - If also this cell is enabled, the code crashes
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007fa757f182ca in wait4 () from /lib64/libc.so.6
#1 0x00007fa757e61953 in do_system () from /lib64/libc.so.6
#2 0x00007fa7592f8d3c in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/releases/LCG_102/ROOT/6.26.04/x86_64-centos9-gcc11-opt/lib/libCore.so
#3 0x00007fa7592f6435 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/releases/LCG_102/ROOT/6.26.04/x86_64-centos9-gcc11-opt/lib/libCore.so
#4 <signal handler called>
#5 0x0000000000000000 in ?? ()
#6 0x00000000006cff6d in na64util::MeanSquaredErrorMetric::operator() (this=0x344c558, waveform=0x34d01c0, firstPulsePtr=0x3446e90, lastPulsePtr=0x3446f80, loggers=0x344c458) at src/algo/wf-fitter-sorted-collection.cc:41
#7 0x00000000006ae918 in na64util::SortedWaveformFitProceduresCollection<na64util::MeanSquaredErrorMetric, std::less<double> >::_apply_metric (this=0x344c450, waveform=0x34d01c0) at include/na64/algo/wf-fitter-sorted-collection.hh:66
#8 0x00000000006ae1df in na64util::SortedWaveformFitProceduresCollection<na64util::MeanSquaredErrorMetric, std::less<double> >::fit_peaks (this=0x344c450, detName=0x6eb828 "(undefined-detector)", waveform=0x34d01c0, initialsAndConstraints=0x0) at include/na64/algo/wf-fitter-sorted-collection.hh:169
#9 0x00000000006978ac in na64util::Moyal01Reco::reconstruct (this=0x36b0a80, detName=0x6eb828 "(undefined-detector)", waveform=0x34d01c0, result=...) at src/reco/sadc.cc:367
#10 0x0000000000696dc7 in na64util::iMSADCReco::reconstruct (this=0x36b0a80, raw=..., p0=345.25, p1=343, detName=0x6eb828 "(undefined-detector)") at src/reco/sadc.cc:55
#11 0x0000000000697387 in na64util::Moyal01Reco::reconstruct (this=0x36b0a80, raw=..., p0=345.25, p1=343, detName=0x6eb828 "(undefined-detector)") at src/reco/sadc.cc:235
#12 0x000000000055a125 in RunWaveReco1(std::vector<unsigned short, std::allocator<unsigned short> > const&, CaloCalibData const&) ()
#13 0x000000000055b03a in RunCellReco(std::vector<Cell*, std::allocator<Cell*> >&, Cell&, std::vector<unsigned short, std::allocator<unsigned short> > const&, CaloCalibData const&) ()
#14 0x0000000000561cfa in RunP348Reco(CS::DaqEventsManager const&) ()
#15 0x000000000054baae in main ()
===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5 0x0000000000000000 in ?? ()
#6 0x00000000006cff6d in na64util::MeanSquaredErrorMetric::operator() (this=0x344c558, waveform=0x34d01c0, firstPulsePtr=0x3446e90, lastPulsePtr=0x3446f80, loggers=0x344c458) at src/algo/wf-fitter-sorted-collection.cc:41
#7 0x00000000006ae918 in na64util::SortedWaveformFitProceduresCollection<na64util::MeanSquaredErrorMetric, std::less<double> >::_apply_metric (this=0x344c450, waveform=0x34d01c0) at include/na64/algo/wf-fitter-sorted-collection.hh:66
#8 0x00000000006ae1df in na64util::SortedWaveformFitProceduresCollection<na64util::MeanSquaredErrorMetric, std::less<double> >::fit_peaks (this=0x344c450, detName=0x6eb828 "(undefined-detector)", waveform=0x34d01c0, initialsAndConstraints=0x0) at include/na64/algo/wf-fitter-sorted-collection.hh:169
#9 0x00000000006978ac in na64util::Moyal01Reco::reconstruct (this=0x36b0a80, detName=0x6eb828 "(undefined-detector)", waveform=0x34d01c0, result=...) at src/reco/sadc.cc:367
#10 0x0000000000696dc7 in na64util::iMSADCReco::reconstruct (this=0x36b0a80, raw=..., p0=345.25, p1=343, detName=0x6eb828 "(undefined-detector)") at src/reco/sadc.cc:55
#11 0x0000000000697387 in na64util::Moyal01Reco::reconstruct (this=0x36b0a80, raw=..., p0=345.25, p1=343, detName=0x6eb828 "(undefined-detector)") at src/reco/sadc.cc:235
#12 0x000000000055a125 in RunWaveReco1(std::vector<unsigned short, std::allocator<unsigned short> > const&, CaloCalibData const&) ()
#13 0x000000000055b03a in RunCellReco(std::vector<Cell*, std::allocator<Cell*> >&, Cell&, std::vector<unsigned short, std::allocator<unsigned short> > const&, CaloCalibData const&) ()
#14 0x0000000000561cfa in RunP348Reco(CS::DaqEventsManager const&) ()
#15 0x000000000054baae in main ()
===========================================================
This is the result of dump.exe
...
Digit DetID=(name=ECAL0 number=43), DataID=562954349051904x=3 y=3
srcID=2 port=0 chip=1 channel=6
latch all mode, overflow=0, suppresion=0
Integrals: 0
Samples: 344 341 344 340 345 347 348 344 346 346 700 2207 2941 2093 1317 987 731 603 533 514 508 371 316 332 328 333 346 346 346 343 345 355
(SADC) detector=ECAL0 x=3 y=3 srcid=2 port=0 chip=1 channel=6 data=344 341 344 340 345 347 348 344 346 346 700 2207 2941 2093 1317 987 731 603 533 514 508 371 316 332 328 333 346 346 346 343 345 355
...
I do not understand the issue, the data looks ok.
The error points to line 41 of wl-fitter-sorted-collection
:
d += cPulse->fit_function(nSample*nsPerSample, cPulse->fitFunctionUserdata);
I added before this the line printf("AAA DEBUG. cPulse: %p fitFunctionUserData: %p nSample: %i nsPerSample:%f\n",cPulse,cPulse->fitFunctionUserdata,nSample,nsPerSample);
The result is below (no "null pointers", but interestingly cPulse pointer is the same for the first two calls)
AAA DEBUG. cPulse: 0x3db05a0 fitFunctionUserData: 0x4863420 nSample: 9 nsPerSample:12.500000
AAA DEBUG. cPulse: 0x3db05a0 fitFunctionUserData: 0x4863420 nSample: 10 nsPerSample:12.500000
AAA DEBUG. cPulse: 0x3db0618 fitFunctionUserData: 0x3bce570 nSample: 10 nsPerSample:12.500000
@rdusaev can you check this specific event for this specific cell in p348-reco
? This is quite important, since if the code crashes during the execution on the farm, then the job is stopped.