FPE handling in CA-based jobs
With this MR, a new flag Exec.FPE
controls how Floating Point Exceptions are handled in ComponentAccumulator based jobs.
A value of -1 is the equivalent of the old-style flag "rec.doFloatingPointException=True", eg the job will be aborted with a core-dump on the first FPE.
A value of 0 (default) will set up the FPEAuditor to print a one-line WARNING on each FPE, like the standard behavior with RecExCommon.
A value greater then 0 will set the property FPEAuditor.NStacktracesOnFPE
to this value, eg stack-traces will be printed for that many FPEs.
While working on this, I realized that Auditors were no properly handled by the ComponentAccumulator. The first commit fixes this issue.
Merge request reports
Activity
This merge request affects 5 packages:
- Control/AthenaConfiguration
- Control/AthenaExamples/AthExUnittest
- Control/AthenaMonitoring
- TileCalorimeter/TileExample/TileRecEx
- TileCalorimeter/TileMonitoring
Affected files list will not be printed in this case
Adding @solodkov ,@iouri ,@pavol ,@ssnyder ,@rbianchi ,@harkusha as watchers
added Core DQ JetEtmiss Tile analysis-review-required master review-pending-level-1 labels
CI Result FAILURE (hash 7b125467)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 60768]This merge request affects 5 packages:
- Control/AthenaConfiguration
- Control/AthenaExamples/AthExUnittest
- Control/AthenaMonitoring
- TileCalorimeter/TileExample/TileRecEx
- TileCalorimeter/TileMonitoring
Affected files list will not be printed in this case
Adding @solodkov ,@iouri ,@pavol ,@ssnyder ,@rbianchi ,@harkusha as watchers
added NewConfig label
- Resolved by Tadej Novak
CI Result FAILURE (hash cae6c032)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
NEW: project list is tailored to contain projects affected by code changes
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 60800]- Resolved by Christos Anastopoulos
- Resolved by Christos Anastopoulos
- Resolved by Walter Lampl
added review-user-action-required label and removed review-pending-level-1 label
In the meanwhile I found another problem that I don't understand yet: trying to run one of the failing ACTS-tests with the option
Exec.FPE=10
I expect to see stack-lines from the FPEAuditor. But in fact, the TDAQ ErrorHandler gets invoked. For the simpler CaloRecoConfig.py everything works as expected.In the log of the ACTS example I noticed also that the FPEAuditor installs its signal handler twice. I don't understand why. I verified that the ComponentAccumulator has only one instance of FPEAuditor in
AuditorSvc.Auditors
and also self._auditors has no duplicate.I make this a draft for now ...
added 236 commits
-
cae6c032...5417bfc4 - 227 commits from branch
atlas:master
- 403b0dea - ComponentAccumulator.py: Merge auditors like any other component
- 15a6e119 - introduce FPE handling in CA-based jobs
- a550e658 - introduce unit-test for FPE handling
- f24a055d - remove manual FPE-Auditor cfg, now done by MainServicesCfg
- a934def2 - re-add CompFactory to Run3DQTestingDriver.py
- 408569f9 - FPEAndCoreDumpConfig: Protect against the absence of AthenaAuditors in some projects
- d0876a3c - some cleanup following MR review
- f37f49ad - remove stray comma
- 92219529 - Set env var TDAQ_ERS_NO_SIGNAL_HANDLERS in FPEAuditor::initalize() to avoid...
Toggle commit list-
cae6c032...5417bfc4 - 227 commits from branch
Hi @pagessin ,
can you estimate how long it will take to fix the FPEs I described in ATLASRECTS-7341? I wounder if we should wait for it or ignore the FPEs in the failing unit-tests.
- Walter