Skip to content

Avoid storing ODIN objects in MEPProvider

Roel Aaij requested to merge avoid-odin-objects into master

Crashes were reported in the MEPProvider in production, see the message from @pdurante (thanks!) at the bottom.

It's not clear where the bug is, and I have not been able to reproduce it locally, but the info from @pdurante gives some good hints. Based on those I've made three changes:

  • Avoid using - and expecially storing - ODIN objects;
  • Properly initialize stored data;
  • Fix a potential synchronisation issue when "resetting" an ODIN.

Message:

For a bit more context on the crashes happening today (which look a bit different from the crashes we were having on Friday night, but may still have the same root cause).

The segmentation violation is triggered here

 if ( freed ) {
 if ( msgLevel( MSG::DEBUG ) ) debug() << "Freed slice " << slice_index <<
 endmsg;
 m_odins[slice_index] = LHCb::ODIN{};
 m_slice_cond.notify_one();
   }

The assignment causes the odin bank at the current slice_index​ (which is 3 in his instance) to be freed.

That odin bank is actually a valid DataObject (no zero fields in its data)

print m_odins._M_impl._M_start[3]
$9 = {<DataObject> = {_vptr.DataObject = 0x7f60030b7e90 <vtable
for LHCb::ODINImplementation::v7::ODIN+16>, m_refCount = 0, m_version = 7'\a', m_pRegistry = 0x0, m_pLinkMgr = {_M_t =
 {<std::__uniq_ptr_impl<LinkManager, std::default_delete<LinkManager>>> =
 {_M_t = {<std::_Tuple_impl<0, LinkManager*, std:: default_delete<LinkManager> >> = {<std::_Tuple_impl<1, std:: default_delete<LinkManager> >> = {<std::_Head_base<1, std:: default_delete<LinkManager>, true>> = {_M_head_impl = {<No data fields>}} , <No data fields>}, <std::_Head_base<0, LinkManager*, false>>
 = {_M_head_impl = 0x7f541800bbc0}, <No data fields>}, <No data fields>}} , <No data fields>}}}, static BANK_VERSION = 7,
 static BANK_SIZE = 40, data = {_M_elems = {261975, 20, 1090512056,
 391822, 0, 32272, 41947691, 2112581, 1154637000, 0}}}

But the associated LinkManager​ looks corrupted

 -exec print __ptr.m_linkVector._M_impl._M_finish
 $15 = (std::_Vector_base<LinkManager::Link*, std:: allocator<LinkManager::Link*> >::pointer) 0xa04966e8c996d862
 -exec print __ptr.m_linkVector._M_impl._M_start
 $16 = (std::_Vector_base<LinkManager::Link*, std:: allocator<LinkManager::Link*> >::pointer) 0x7f5405fe95d1
 -exec print __ptr.m_linkVector._M_impl._M_finish -
 __ptr.m_linkVector._M_impl._M_start
 $20 = -862126025399400365

(no vector could possibly be that big) and calling its destructor causes the segmentation violation.

Edited by Roel Aaij

Merge request reports