correctly pass x509userproxy to condor
Closes ATLASG-2561. Tagging @krumnack. This should target main, but also 24.2, 22.2 (I don't see the tag, please clarify), and possibly also 21.2.
This MR fixes two problems listed in ATLASG-2561 when submitting an EvenLoop job without a shared filesystem using the condor driver
- failed to run
condor_submit
- failed to read the userproxy from /tmp
The first one is due to the fact that if the variable X509_USER_PROXY
is not set an empty line is written in the submission file. A check is set and if the env is not present a warning is printed.
The second problem is due to a (not clear to me) problem of htcondor: even if the official documentation does not mention it condor cannot read from /tmp where usually the userproxy is created. Actually on the official manual I read
x509userproxy = <full-pathname>
Used to override the default path name for X.509 user certificates. The default location for X.509 proxies is the
/tmp
directory,
but I got an error
error reading from /tmp/x509up_u11547: (errno 2) No such file or directory; STARTER failed to receive file(s) from ...
obviously the file exists and it is readable. This problem is mentioned in the CERN manual
If you provide
$(Proxy_path)
with the default location of your proxy in/tmp/x509up_u$(id -u)
, please note that that file is not readable for Condor:
In this MR the userproxy is copied into the submission folder.
To test
I used PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/share/FullCPAlgorithmsTest_eljob.py with the following patch
diff --git a/PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/share/FullCPAlgorithmsTest_eljob.py b/PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/share/FullCPAl
index 0be305f..f63b2ff 100755
--- a/PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/share/FullCPAlgorithmsTest_eljob.py
+++ b/PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/share/FullCPAlgorithmsTest_eljob.py
@@ -131,11 +131,23 @@ else :
# way it tests whether the code works correctly with that driver,
# which is a lot more similar to the way the batch/grid drivers work.
driver = ROOT.EL.LocalDriver()
+driver = ROOT.EL.CondorDriver()
+job.options().setBool(ROOT.EL.Job.optBatchSharedFileSystem, False)
+job.options().setString(ROOT.EL.Job.optCondorConf, "RequestMemory=8GB")
+#certificate_path = os.path.expandvars('$X509_USER_PROXY')
+#certificate_newpath = os.path.join(os.getcwd(), os.path.split(certificate_path)[1])
+#import shutil
+#logging.info("copying certificate from %s to %s", certificate_path, certificate_newpath)
+#shutil.copyfile(certificate_path, certificate_newpath)
+driver.shellInit = """
+pwd
+hostname
+ls
+echo "RUNNING THE TEST"
+voms-proxy-info -all
+"""
-if options.direct_driver :
- # this is for testing purposes, as only the direct driver respects
- # the limit on the number of events.
- driver = ROOT.EL.DirectDriver()
print ("submitting job now", flush=True)
-driver.submit( job, submitDir )
+driver.submitOnly( job, submitDir )
and with package_filter.txt
:
+ PhysicsAnalysis/Algorithms/AsgAnalysisAlgorithms
+ PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig
+ PhysicsAnalysis/D3PDTools/EventLoop
- .*
The code still does not work because of another bug ( ATLASG-2563)