First iteration of CPGrid submission
This is the first iteration of the CP grid submission, namely CPGridRun.py. The current script generate a prun ... script according to a few input flags. The automatic submission is simple to implement but I want to make sure it is what we want first. Below are some features and testing you can try.
- I added a very nice
-hflag. It splitsprunflags,CPRun.py(base and its framework dependent) flags and the flag for theCPGridRun.py. I moved the import ofCPRun.pyto avoid long upfront importing.
usage: CPGridRun.py [-h] [--input-list INPUT_LIST] [--output-files OUTPUT_FILES] [--destSE DESTSE] [--mergeType MERGETYPE] [--gridUsername GRIDUSERNAME] [--prefix PREFIX] [--suffix SUFFIX] [--outDS OUTDS] [--groupProduction] [--exec EXEC] [--noSubmit] [--testRun] [--recreateTar]
CPGrid runscript to submit CPRun.py jobs to the grid. This script will submit a job to the grid using files in the input text one by one.CPRun.py can handle multiple sources of input and create one output; but not this script
options:
-h, --help Show this help message and continue
Input/Output file configuration:
--input-list INPUT_LIST
Path to the text file containing list of containers on the panda grid. Each container will be passed to prun as --inDS and is run individually
--output-files OUTPUT_FILES
The output files of the grid job. Example: --output-files "A.root,B.txt,B.root" results in A/A.root, B/B.txt, B/B.root in the output directory. No need to specify if using CPRun.py
--destSE DESTSE Destination storage element (PanDA)
--mergeType MERGETYPE
Output merging type, [None, Default, xAOD]
Input/Output naming configuration:
--gridUsername GRIDUSERNAME
Grid username, or the groupname. Default is the current user. Only affect file naming
--prefix PREFIX Prefix for the output directory. Dynamically set with input container if not provided
--suffix SUFFIX Suffix for the output directory
--outDS OUTDS Name of an output dataset. OUTDS will contain all output files(PanDA). If not provided, support dynamic naming if input name is in the Atlas production format or typical user production format
CPGrid configuration:
--groupProduction Only use for official production
--exec EXEC Flags for the CPRun.py or custom script to run on the grid encapsulated in a double quote (PanDA).
CPRun.py with preset behavior: "-t config.yaml -e 50 --no-systematics"
CPRun.py but overwriting preset behavior: "CPRun.py --input-list myroot.root -t config.yaml -e 50 --no-systematics --flagA --flagB"
Custom script: "customRun.py -i inputs -o output --text-config config.yaml --flagA --flagB
Submission configuration:
--noSubmit Do not submit the job to the grid (PanDA)
--testRun Will submit job to the grid but greatly limit the number of files per job and number of events
--recreateTar Re-compress the source code. Source code are compressed by default in submission, this is useful when the source code is updated
If you are using CPRun.py, the following flags are for the CPRun.py in this framework
Runscript for CP Algorithm unit tests
options:
-h, --help show this help message and exit
Base Script Options:
--input-list INPUT_LIST
path to text file containing list of input files, or a single root file
--output-name OUTPUT_NAME
output name of the analysis root file
-e MAX_EVENTS, --max-events MAX_EVENTS
Number of events to run
-t TEXT_CONFIG, --text-config TEXT_CONFIG
path to the YAML configuration file
--no-systematics Disable systematics
EventLoop specific arguments:
--direct-driver Run the job with the direct driver
--strip Move the analysis root file to the top level, and delete the work directory. Mainly useful for standardizing the output with the Athena framework.
--work-dir WORK_DIR The work directory for the EL job
The example above successfully submitted to the grid and finished without any problem.
- The script takes only a text file as input. The text file should contain the containers user wants to use located on the grid.
- The script has a quite a lot pre-set for
CPRun.py, but user can the preset (read--execflag), or supply their own runscript. But they are prone to i/o error. -
echo mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490 > input.txt(the ASG_TEST_FILE_MC container)
The simplest code you can run is CPGridRun.py --input-list input.txt --exec "-t test_configuration_Run2.yaml -e 150" --suffix improveName2 will generate
prun \
--inDS mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490 \
--outDS user.holau.PhPy8EG.410470.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490.improveName2 \
--useAthenaPackages \
--cmtConfig x86_64-el9-gcc13-opt \
--writeInputToTxt IN:in.txt \
--outputs output:output.root \
--exec CPRun.py --input-list in.txt --strip -t test_configuration_Run2.yaml -e 150 \
--memory 2000 \
--addNthFieldOfInDSToLFN 2,3,6 \
--mergeOutput \
--outTarBall cpgrid.tar.gz
-
The
--inTarBalland--outTarBallis generated automatically depending on a flag--recreateTarand checking the environment. If the tar exists and not recreating, it uses the tar. Else it compresses one. -
It has a static parser if the input container name fits ATLAS production group formatting
-
I added
--testoption to submit a job to the grid but greatly limiting the number of events and files. This will be useful to see if the code runs in a small fashion. -
A more complicated example to see the full functionality
CPGridRun.py--input-list input.txt--exec "customRun.py -i hello.root -t test_configuration_Run2.yaml -e 150"--suffix v3 --prefix refactor(affecting --outputDS)--test(affecting --nFiles and --nEventsPerFile)--recreateTar(affecting --inTarBall or --outTarBall)--groupProduction(add --official, change output name to group, and choose certificate)--gridUsername PHY-KYLE(affecting outDS name, if group production it affects the certificate)--destSE UVIC-RINGROAD-HEP_LOCALGROUPDISK(affecting --destSE)--output-files "A.root,B.root,B.txt"(affecting --outputs, the output of each file becomes: outDS/A/A.root, outDS/B/B.root and outDS/B/B.txt)
prun \
--inDS mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490 \
--outDS group.PHY-KYLE.refactor.410470.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490.v3 \
--useAthenaPackages \
--cmtConfig x86_64-el9-gcc13-opt \
--writeInputToTxt IN:in.txt \
--outputs A:A.root,B:B.root,B:B.txt \
--exec customRun.py -i hello.root -t test_configuration_Run2.yaml -e 150 \
--memory 2000 \
--addNthFieldOfInDSToLFN 2,3,6 \
--mergeOutput \
--outTarBall cpgrid.tar.gz \
--official \
--voms atlas:/atlas/PHY-KYLE/Role=production \
--destSE UVIC-RINGROAD-HEP_LOCALGROUPDISK \
--nEventsPerFile 300 \
--nFiles 10
- In the help message, if it has (PanDA) they are directly available in the prun.
The functionality of the script is still not as good as the one in TopCPToolkit, but those are not necessary features so I'll implement them in the future iterations.