Submit script fix with cutflow merge (fixed)
Description
This addresses !810 (merged), which reverts !801 (merged). It allows for a grid-submit option -g which can be used to disable merging (and doesn't break the case with merging) for for the data/mc dumping modes that have the additional cutflow_counts.json output which cannot be merged with the usual merging script.
Additionally, this also fixes some DSID parsing issues with data DSIDs.
The main fix comes in with this change:
- EXTRA_ARGS+=' --mergeScript="hdf5-merge-nolock -o %OUT -i %IN" '
+ EXTRA_ARGS+=("--mergeScript=hdf5-merge-nolock -o %OUT -i %IN")
Turns out, if you define EXTRA_ARGS as a string and later use a bare $EXTRA_ARGS in the prun invocation, the quotes are not used by bash to squash the arguments, but are taken literally and passed to prun. Funnily enough, I started using a bash lsp recently which actually warned me about exactly this problem on this line, which is how I found out.
The recommended solution is to use an associative array and "${EXTRA_ARGS[@]}" in the prun invocation instead (which has it's own problems, because an array cannot be exported to the submit-job function). Since this pattern is used extensively in other places in this script I preemptively applied the lsp suggestions here as well.
As a last point, I also made the dryrun printing more precise by printing all "whole" arguments that a dryrun command receives in quotes, where quotes itself are escaped with \. This makes it easier to spot this kind of error in the future with a dryrun and check if my changes actually work.
As a sanity check, there are three options (-g, -m and -a) where I applied some direct changes to, so I tried dry-running all of these and check the submission logs:
grid-submit -df single-btag
---
DRY RUNNING: "prun" "--exec" "dump-single-btag %IN -c EMPFlow.json" "--outDS" "user.jackersc.601589.e8547_s3797_r13144_p6698.tdd.EMPFlow.25_2_55.Valentine2025-53-ga8656e3" "--inDS" "mc20_13TeV.601589.PhPy8EG_A14_ttbar_hdamp258p75_nonallhadron.deriv.DAOD_FTAG1.e8547_s3797_r13144_p6698" "--useAthenaPackages" "--inTarBall=job.tgz" "--outputs" "output.h5" "--mergeScript=hdf5-merge-nolock -o %OUT -i %IN" "--noEmail"
grid-submit -dfg single-btag
---
DRY RUNNING: "prun" "--exec" "dump-single-btag %IN -c EMPFlow.json" "--outDS" "user.jackersc.601589.e8547_s3797_r13144_p6698.tdd.EMPFlow.25_2_55.Valentine2025-53-ga8656e3" "--inDS" "mc20_13TeV.601589.PhPy8EG_A14_ttbar_hdamp258p75_nonallhadron.deriv.DAOD_FTAG1.e8547_s3797_r13144_p6698" "--useAthenaPackages" "--inTarBall=job.tgz" "--outputs" "output.h5" "--noEmail"
grid-submit -dfgm single-btag
---
DRY RUNNING: "prun" "--exec" "dump-single-btag %IN -c EMPFlow.json" "--outDS" "user.jackersc.601589.e8547_s3797_r13144_p6698.tdd.EMPFlow.25_2_55.Valentine2025-53-ga8656e3" "--inDS" "mc20_13TeV.601589.PhPy8EG_A14_ttbar_hdamp258p75_nonallhadron.deriv.DAOD_FTAG1.e8547_s3797_r13144_p6698" "--useAthenaPackages" "--inTarBall=job.tgz" "--outputs" "output.h5" "--forceStaged" "--noEmail"
grid-submit -dfgma single-btag
---
DRY RUNNING: "prun" "--exec" "dump-single-btag %IN -c EMPFlow.json" "--outDS" "user.jackersc.601589.e8547_s3797_r13144_p6698.tdd.EMPFlow.25_2_55.Valentine2025-53-ga8656e3.test" "--inDS" "mc20_13TeV.601589.PhPy8EG_A14_ttbar_hdamp258p75_nonallhadron.deriv.DAOD_FTAG1.e8547_s3797_r13144_p6698" "--useAthenaPackages" "--inTarBall=job.tgz" "--outputs" "output.h5" "--forceStaged" "--nFiles 1" "--noEmail"
I will also soon test this script in production, if we want to wait for that.
Review checklist:
-
CI Passing -
Comments addressed -
Source branch is up to date with target