Possible thread safety issue with MagneticField
I am running a RICH ref. index calibration test job, with threading enabled, over a data set with many different runs, necessary to exercise the calibration update machinery on run changes. I am using 8 threads.
I am seeing after a while an issue with an update to the Magnetic Field. See attached log for full details.
calib-thread-error-mag-service.log
DeMagnetConditionCall INFO Loading mag field from /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf
MagneticFieldExtension INFO Scale factor: 0.999983
MagneticFieldGridReader INFO Opened magnetic field file: /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf/field.v5r0.c1.down.cdf
TCanvas::Print INFO Current canvas added to pdf file /mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/summaries/RichCalibSummaries/2023/01/12/RefIndex/Run-252565.pdf
RichRefCalibLong.Writer INFO Rich1Gas Current Scale Factor 1.0226 Correction 1.00582 -> Updated Scale Factor 1.02855
RichRefCalibLong.Writer INFO Re-using Rich1Gas conditions version 54 for Run 252565
RichRefCalibLong.Writer INFO Removing "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich1/Environment/Gas.yml/.pool/v54"
RichRefCalibLong.Writer INFO Successfully wrote Rich1Gas Scale factor to "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich1/Environment/Gas.yml/.pool/v54"
RichRefCalibLong.Writer INFO Removing "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich1/Environment/Gas.yml/252565"
RichRefCalibLong.Writer INFO Copying "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich1/Environment/Gas.yml/.pool/v54" -> "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich1/Environment/Gas.yml/252565"
RichRefCalibLong.Writer INFO Gaus Const = 17862.1 +- 61.2653
RichRefCalibLong.Writer INFO Gaus Mean = 4.60847e-05 +- 2.52513e-06
RichRefCalibLong.Writer INFO Gaus Sigma = 0.000584706 +- 2.54756e-06
RichRefCalibLong.Writer INFO Gaus Asym = 0.0926516 +- 0.0145801
RichRefCalibLong.Writer INFO Bkg Par 0 = 24325.9 +- 45.0521
RichRefCalibLong.Writer INFO Bkg Par 1 = 332816 +- 23608.8
RichRefCalibLong.Writer INFO Bkg Par 2 = -8.94765e+07 +- 6.08572e+06
RichRefCalibLong.Writer INFO Bkg Par 3 = 1.19922e+10 +- 2.8171e+09
RichRefCalibLong.Writer INFO Rich2Gas Current Scale Factor 0.987814 Correction 1.00301 -> Updated Scale Factor 0.990785
MagneticFieldGridReader ERROR Number of points in field map does not match
MagneticFieldExtension ERROR Error loading magnetic field map
TCanvas::Print INFO Current canvas added to pdf file /mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/summaries/RichCalibSummaries/2023/01/12/RefIndex/Run-252565.pdf
RichRefCalibLong.Writer INFO Rich2Gas Current Scale Factor 0.987814 Correction 1.00301 -> Updated Scale Factor 0.990785
RichRefCalibLong.Writer INFO Re-using Rich2Gas conditions version 54 for Run 252565
RichRefCalibLong.Writer INFO Removing "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich2/Environment/Gas.yml/.pool/v54"
RichRefCalibLong.Writer INFO Successfully wrote Rich2Gas Scale factor to "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich2/Environment/Gas.yml/.pool/v54"
RichRefCalibLong.Writer INFO Removing "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich2/Environment/Gas.yml/252565"
RichRefCalibLong.Writer INFO Copying "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich2/Environment/Gas.yml/.pool/v54" -> "/mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/conditions/lhcb-conditions-database/Conditions/Rich2/Environment/Gas.yml/252565"
TCanvas::Print INFO pdf file /mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/summaries/RichCalibSummaries/2023/01/12/RefIndex/Run-252565.pdf has been closed
RichRefCalibLong.Writer INFO Created /mnt/work/pcmf/jonesc/cernbox/LHCb/output/Panoptes/RefIndexCalib/summaries/RichCalibSummaries/2023/01/12/RefIndex/Run-252565.pdf
load_magnetic_field_map ERROR Error loading magnetic field map
DependencyHandler ERROR +++ Exception while creating dependent Condition B8388B83 97281372:
DependencyHandler ERROR load_magnetic_field_map: Error loading magnetic field map
UserPool INFO +++ * Conditions for USER pool with IOV: run(0):[252568-252568] [14220 entries]
DependencyHandler ERROR ++ Exception while creating dependent Condition B8388B83 97281372.
DependencyHandler ERROR +++ Exception while creating dependent Condition 124D33D0 97281372:
DependencyHandler ERROR DependencyHandler: ++ Exception while creating dependent Condition B8388B83 97281372.
See how on previous updates, and there are a number in the job, it all works OK.
DeMagnetConditionCall INFO Loading mag field from /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf
MagneticFieldExtension INFO Scale factor: 0.999983
MagneticFieldGridReader INFO Opened magnetic field file: /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf/field.v5r0.c1.down.cdf
MagneticFieldGridReader INFO Opened magnetic field file: /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf/field.v5r0.c2.down.cdf
MagneticFieldGridReader INFO Opened magnetic field file: /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf/field.v5r0.c3.down.cdf
MagneticFieldGridReader INFO Opened magnetic field file: /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/FieldMap/v5r7/cdf/field.v5r0.c4.down.cdf
The one thing that is different about the one that fails is, at the same time as the above update is running, another thread starts the creation of a condition update (that implies fitting some histograms and writting some files to disc).
These two things should have nothing in common, the fit and writing does not need the field map, so on the face of it I have no idea how the two could interfere, but somehow this seems to be happening.
I am still trying to wrap my head around what is going on here, I am not yet sure if the issue is with the field service or something in the RICH, but given the first indication of a problem is
MagneticFieldGridReader ERROR Number of points in field map does not match
MagneticFieldExtension ERROR Error loading magnetic field map
I am currently inclining towards something there.