Skip to content
Snippets Groups Projects

Bugfix for the Event Service

Merged Vakhtang Tsulaia requested to merge tsulaia/athena:master-mpes into master
All threads resolved!
4 files
+ 67
36
Compare changes
  • Side-by-side
  • Inline
Files
4
  • f2a79b01
    Bugfix for the Event Service · f2a79b01
    Vakhtang Tsulaia authored
    Bugfix in the mechanism of handling failed MP workers in the Event Service.
    This patch covers the case when a worker fails to transition from the event
    processing state into the finalization state after receiving "No more events"
    message from the range scatterer. The idea here is that instead of releasing
    a fixed number of workers (nprocs), the scatterer will keep releasing the
    workers until it gets a signal that all of them have finished.
/*
Copyright (C) 2002-2017 CERN for the benefit of the ATLAS collaboration
Copyright (C) 2002-2020 CERN for the benefit of the ATLAS collaboration
*/
#include "EvtRangeProcessor.h"
@@ -47,6 +47,7 @@ EvtRangeProcessor::EvtRangeProcessor(const std::string& type
, m_isPileup(false)
, m_rankId(-1)
, m_nEventsBeforeFork(0)
, m_activeWorkers(0)
, m_inpFile("")
, m_chronoStatSvc("ChronoStatSvc", name)
, m_incidentSvc("IncidentSvc", name)
@@ -120,6 +121,7 @@ int EvtRangeProcessor::makePool(int, int nprocs, const std::string& topdir)
}
m_nprocs = (nprocs==-1?sysconf(_SC_NPROCESSORS_ONLN):nprocs);
m_activeWorkers = m_nprocs;
m_subprocTopDir = topdir;
// Create rank queue and fill it
@@ -236,6 +238,15 @@ StatusCode EvtRangeProcessor::wait_once(pid_t& pid)
return StatusCode::FAILURE;
}
}
else {
// The worker finished successfully and it was the last worker. Release the Event Range Scatterer
if(--m_activeWorkers==0
&& !m_sharedFailedPidQueue->send_basic<pid_t>(-1)) {
// To Do: how to report this error to the pilot?
ATH_MSG_ERROR("Failed to release the Event Range Scatterer");
return StatusCode::FAILURE;
}
}
// Erase the pid from m_procStates map
m_procStates.erase(itProcState);
Loading