Avoid storing ODIN objects in MEPProvider
Crashes were reported in the MEPProvider
in production, see the message from @pdurante (thanks!) at the bottom.
It's not clear where the bug is, and I have not been able to reproduce it locally, but the info from @pdurante gives some good hints. Based on those I've made three changes:
- Avoid using - and expecially storing -
ODIN
objects; - Properly initialize stored data;
- Fix a potential synchronisation issue when "resetting" an ODIN.
Message:
For a bit more context on the crashes happening today (which look a bit different from the crashes we were having on Friday night, but may still have the same root cause).
The segmentation violation is triggered here
if ( freed ) {
if ( msgLevel( MSG::DEBUG ) ) debug() << "Freed slice " << slice_index <<
endmsg;
m_odins[slice_index] = LHCb::ODIN{};
m_slice_cond.notify_one();
}
The assignment causes the odin bank at the current slice_index
(which
is 3 in his instance) to be freed.
That odin bank is actually a valid DataObject
(no zero fields in its
data)
print m_odins._M_impl._M_start[3]
$9 = {<DataObject> = {_vptr.DataObject = 0x7f60030b7e90 <vtable
for LHCb::ODINImplementation::v7::ODIN+16>, m_refCount = 0, m_version = 7'\a', m_pRegistry = 0x0, m_pLinkMgr = {_M_t =
{<std::__uniq_ptr_impl<LinkManager, std::default_delete<LinkManager>>> =
{_M_t = {<std::_Tuple_impl<0, LinkManager*, std:: default_delete<LinkManager> >> = {<std::_Tuple_impl<1, std:: default_delete<LinkManager> >> = {<std::_Head_base<1, std:: default_delete<LinkManager>, true>> = {_M_head_impl = {<No data fields>}} , <No data fields>}, <std::_Head_base<0, LinkManager*, false>>
= {_M_head_impl = 0x7f541800bbc0}, <No data fields>}, <No data fields>}} , <No data fields>}}}, static BANK_VERSION = 7,
static BANK_SIZE = 40, data = {_M_elems = {261975, 20, 1090512056,
391822, 0, 32272, 41947691, 2112581, 1154637000, 0}}}
But the associated LinkManager
looks corrupted
-exec print __ptr.m_linkVector._M_impl._M_finish
$15 = (std::_Vector_base<LinkManager::Link*, std:: allocator<LinkManager::Link*> >::pointer) 0xa04966e8c996d862
-exec print __ptr.m_linkVector._M_impl._M_start
$16 = (std::_Vector_base<LinkManager::Link*, std:: allocator<LinkManager::Link*> >::pointer) 0x7f5405fe95d1
-exec print __ptr.m_linkVector._M_impl._M_finish -
__ptr.m_linkVector._M_impl._M_start
$20 = -862126025399400365
(no vector could possibly be that big) and calling its destructor causes the segmentation violation.