generate DAGMan from a shell script (`transcribe` and `job`)
Given the following script (scenario
):
#!/bin/zsh
set -e
alias submit="job -d $PWD/output"
PNFS=/some/where
for slice in 15 30 50 80 120 170 300 470 600 800 1000 1400 1800 2400 3200
do
sample=Pythia18${slice}
submit mergeNtuples "$PNFS/QCD_Pt_${slice}_TuneCP5_13TeV_pythia8/cd1475d24c5fc_darwin21_RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_*/" M1_mergeNtuples/$sample -c meta.info -f
xsect=`grep "^${slice}to" $DARWIN_TABLES/xsections/Pythia_QCD_CP5.info | awk '{print $2}'`
if (( xsect == 0 ))
then
echo "Bad cross section"
exit 1
fi
submit applyMClumi M1_mergeNtuples/$sample M2_applyMClumi/$sample $xsect -f
submit applyPUcleaning M2_applyMClumi/$sample M3_applyPUcleaning/$sample -c meta.info -f
submit applyJEScorrections M3_applyPUcleaning/$sample M4_applyJEScorrections/$sample -c meta.info -f
submit applyJERsmearing M4_applyJEScorrections/$sample M5_applyJERsmearing/$sample -c meta.info -f
submit applyPUprofCorrection M5_applyJERsmearing/$sample M6_applyPUprofCorrection/$sample -c meta.info -f
# TODO: recycle `chain`'s idea without the CSV file?
cd M6_applyPUprofCorrection
submit getUnfHist $sample getUnfHist/$sample -c meta.info
cd -
done
# TODO: repeat for MadGraph+Pythia and MadGraph+Herwig (i.e. further loops)
for era in A B C D
do
sample=Run18${era}
submit mergeNtuples "$PNFS/JetHT/a245b1de0a829_darwin21_Run2018${era}-UL2018_MiniAODv2_GT36-v1/" D1_mergeNtuples/$sample -c meta.info -f
submit applyJEScorrections D1_mergeNtuples/$sample D2_applyJEScorrection/$sample ${DARWIN_TABLES}/JES/Summer19UL18_Run${era}_V5_DATA/ -f
submit applyDataNormalisation D2_applyJEScorrection/$sample D3_applyDataNormalisation/$sample -c meta.info -f
submit applyPrefiringWeights D3_applyDataNormalisation/$sample D4_applyPrefiringWeights/$sample -c meta.info -f
cd D4_applyPrefiringWeights
submit getUnfHist $sample getUnfHist/$sample -c meta.info
cd -
done
submit unfold D4_applyPrefiringWeights/getUnfHist/ M6_applyPUprofCorrection/getUnfHist/ unfold/Pythia18.root -c meta.info
# TODO
# cd unfold
# post getPhysDist Pythia18.root getPhysDist/Pythia18.root -c meta.info
one would like to be able to run
> transcribe scenario
to generate a DAG where each call to submit
would correspond to a node.
-
preparation -
add -a
option toparallel
andsubmit
- Q: should this be rather automated based on the existence of an extension in the output?
- Q: what if the command expects another format than ROOT (e.g.
getHighScalePUeventIDs
)?
-
fix try
option to automatically add0.root
-
allowsubmit
to run for non splittable jobs -
check supernumerary output files
-
-
implement job
(adaptsubmit
)-
input and output names are guessed from the job names -
option -i
is used to get dag location where to output text and where executables and libraries are copied (or use environment variable) -
provides JOB
,VARS
(similar tosubmit
, may have to struggle a bit with regex in input, which must be preserved for the C++ executable) -
add PARENT
if input does not exist yet (guess job name from input name) -
runtry
inSCRIPT PRE
-
hadd
if-a
inSCRIPT POST
-
rm input if-r
inSCRIPT POST
-
-
implement transcribe
(totally new executable)-
replaces submit
in the script withjob -i my.dag
-
link libs and dictionaries -
execute the script given in input (dagman should fail smoothly on script failure) -
forward environment to the jobs -
submit the dag job (unless -d
), possible in bg (if-b
) -
if not in background, watch condor_q
with constraint and kill on Ctrl+C (same assubmit
) -
DOT
-
Possible feature: implement a light version of chain
(see https://gitlab.cern.ch/paconnor/ProtoDarwin/-/merge_requests/18) to generate a series of jobs when only the INFO config file is needed.
Edited by Patrick Connor