Wrong X509_CERT_DIR used
We're seeing jobs failing at some sites due to X509_CERT_DIR
being set incorrectly. I think this is an issue in the pilot.
If I look at 843087622
:
2024-02-07 14:57:29 UTC DEBUG ===========================================================
2024-02-07 14:57:29 UTC DEBUG Environment of execution host
...
2024-02-07 14:57:29 UTC DEBUG X509_CERT_DIR=/var/lib/condor/execute/dir_73294/nsXMDmcXyr4nhKlC3p2Rib9nABFKDmABFKDmF57VDm5DTLDmtbmySm/arc/certificates
...
2024-02-07 14:58:24 UTC WorkloadManagement/JobAgent/Singularity INFO: Creating singularity container
2024-02-07 14:58:24 UTC WorkloadManagement/JobAgent/WorkloadManagement/JobAgent INFO: Found Job LogLevel JDL parameter with value: INFO
2024-02-07 14:58:25 UTC WorkloadManagement/JobAgent/Singularity INFO: Execute singularity command: ['/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.31-1707132678/Linux-x86_64/bin/singularity', 'exec', '--contain', '--ipc', '--workdir', 'DIRAC_containers/job843087622_38085qnh', '--home', '/tmp', '--userns', '--bind', '/cvmfs', '--bind', '/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.31-1707132678/Linux-x86_64:/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.31-1707132678/Linux-x86_64:ro', '--bind', '', '/cvmfs/cernvm-prod.cern.ch/cvm4', '/tmp/dirac_container.sh']
2024-02-07 14:58:25 UTC WorkloadManagement/JobAgent/Singularity INFO: Execute singularity env: {'XRD_STREAMTIMEOUT': '300', 'X509_CERT_DIR': '/var/lib/condor/execute/dir_73294/nsXMDmcXyr4nhKlC3p2Rib9nABFKDmABFKDmF57VDm5DTLDmtbmySm/arc/certificates', 'X509_VOMSES': '/cvmfs/lhcb.cern.ch/etc/grid-security/vomses', 'DIRAC_PILOT_STAMP': 'd7f0a9c1ee2b4260be4c4f3be1cfa587', 'XRD_RUNFORKHANDLER': '0', 'DIRAC_VOMSES': '/cvmfs/lhcb.cern.ch/etc/grid-security/vomses', 'X509_USER_CERT': '/var/lib/condor/execute/dir_73294/nsXMDmcXyr4nhKlC3p2Rib9nABFKDmABFKDmF57VDm5DTLDmtbmySm/user.proxy', 'X509_VOMS_DIR': '/cvmfs/lhcb.cern.ch/etc/grid-security/vomsdir', 'X509_USER_PROXY': '/tmp/proxy', 'XrdSecPROTOCOL': 'gsi,unix', 'XrdSecGSIDELEGPROXY': '1', 'TMP': '/tmp', 'TMPDIR': '/tmp', 'DIRACSYSCONFIG': '/tmp/pilot.cfg'}
X509_CERT_DIR
should be always be overridden by our directory on CVMFS:
bash -c 'export X509_CERT_DIR=/tmp; . /cvmfs/lhcb.cern.ch/lhcbdirac/lhcbdirac; env' | rg X509_CERT
X509_CERT_DIR=/cvmfs/lhcb.cern.ch/etc/grid-security/certificates
it seems almost like the environment isn't being properly propogated by the pilot to the job agent?