Skip to content

Attempt to avoid multithreading-related SIGSEGV in DIM client

Laurent Petre requested to merge bugfix/dim-sigsegv into main

Description

The title is quite explicit. So is the commit message:

This commit continues the long story of the segmentation faults while executing DIM RPC calls. After adding a mutex to prevent simultaneous RPC commands in commit 1cc1d6e7 and later on removing it in commit 5341703c after in-depth testing, the issue appears to be back after the migration to AlmaLinux 9 and the update of the DIM library from version 20r17 to version 20r35.

Upon recommendation of the DIM experts, we are here trying to add an explicit initialization of the library before spawning any thread via a call to dim_init(). This change also enlightens why the RPC commands magically started to work in parallel between the two previously mentioned commits. Indeed, a call to DimServer::start was added. Function that internally calls dim_init(). A priori, one would expect the same in this newer version of the OS and DIM.

Two points must be noted:

  1. The call to dim_init() dramatically improved the situation in our synthetic testing. In regular operations (i.e. presence of both the DIM DNS server and RPC services), the issue was completely fixed. If any of the DIM DNS server or RPC services are missing, the segmentation fault still occurs.
  2. We are currently operating the P5 system in quite harsh conditions in the sense that the detector is entirely off most of the time. As such, many RPC commands are issued just to fail and/or time out. Working in regular operating conditions might give better results.

Future testing (and time!) will tell us whether or not this commit truly helps with the bug. In any case, it does not hurt and is, in all likelihood, good practice.

Related Issue

How Has This Been Tested?

That's the very best question at the moment... At least, it doesn't do worse.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Merge request reports