MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2021-06-21T16:50:31+02:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/144Pipelines failing when only links are included?2021-06-21T16:50:31+02:00Spyros ArgyropoulosPipelines failing when only links are included?The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> I...The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> Indeed it is a duplicate, but this is by design in order to clear the CI. To give some context, these JOs are for a SUSY grid expansion.
>
> I originally tried to upload everything using only symlinks to that control file, but the CI pipelines were failing, claiming that the jobs couldn't find ```MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```
>
> So I duplicated the control file you pointed to and included it in this MR so that the pipelines would succeed. After the MR gets accepted, I was going to make another one where I change all the control files to be symlinks to ```/502xxx/502416/MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```. That way, there would be no duplicated control files floating around.
>
> I realize this is remarkably convoluted, so I'm more than happy to hear other ideas about preparing the JOs for grid expansions in R21.
>
> Cheers,
> Jeff
Failed pipeline: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/2741834
![Screenshot_2021-06-21_at_14.50.38](/uploads/1b1ebf50941d6c15803a23b2ad2bcd32/Screenshot_2021-06-21_at_14.50.38.png)S1.2021Spyros ArgyropoulosSpyros Argyropoulos2021-06-27https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/143Follow-up from ATLMCPROD-93322021-06-07T14:15:29+02:00Spyros ArgyropoulosFollow-up from ATLMCPROD-9332The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might b...The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might be a long shot, but do you think this is something we can catch in the CI? e.g. if there's a variable in MadGraph JOs that has `gridpack` or `grid_pack` in the name and it's still set to `True`, we put out a warning ... ?
>
> Cheers,
> Chris
Make logParser fail if the info in Chris's message below appearsS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/142Follow-up from "LO EFT samples for 4top"2021-05-18T11:32:21+02:00Spyros ArgyropoulosFollow-up from "LO EFT samples for 4top"The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 ...The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 because the machine has 8 cores, it shouldn't be set to 80 in the JOs.
Should we add the following checks/changes:
- if ATHENA_PROC_NUMBER > 1 and release < 21.2.60 => ERROR
- if ATHENA_PROC_NUMBER > 1 => run only 1 event in CI
- change the way we check whether the jO changes ATHENA_PROC_NUMBER - this would only be safe to catch in the transform btw, but until it is implemented there we could change the check to not use anywhere ATHENA_PROC_NUMBER (not even printing it), so e.g. look in the jO and if there is an uncommented line with "ATHENA_PROC_NUMBER" in it then give error
@cgutschoS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/141check_unique_controlFile.sh fails when it shouldn't?2021-05-13T09:00:06+02:00Jeff Shahiniancheck_unique_controlFile.sh fails when it shouldn't?[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, whe...[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, when uploading JOs (with symlinks to one control file) that look like this:
```
$ ls -a *
100001:
myJO_1.py
myControlFile.py
100002:
myJO_2.py
myControlFile.py -> ../100001/myControlFile.py
```
The CI job fails and recommends that you use symlinks (even if you already are):
```
ERROR: Duplicate file(s) found:
./100xxx/100001/myControlFile.py
If the files have exactly the same content, please only keep one physical file replacing the rest with symbolic links.
If the files have differences consider renaming the files that you added.
You can check for differences with diff -w file1 file2
```
Perhaps we need to add ```-type f``` to [this line](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh#L23) as well?
Here's an example of a failing CI job:
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/13818890
Tagging @sargyrop
Best,
JeffS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/140Strange behaviour of commit script when athena is skipped and run time is > 1h2021-05-06T11:03:19+02:00Spyros ArgyropoulosStrange behaviour of commit script when athena is skipped and run time is > 1hFor example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athen...For example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athena:
![Screenshot_2021-05-06_at_09.45.42](/uploads/c01022a7990d31535fb7cca7aa2e6a4c/Screenshot_2021-05-06_at_09.45.42.png)
the
```
printGood -f "\tOK: CI job time estimate: $cpu hours, but athena will not run in the CI"
```
message is not printed because the script never reaches that point.S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/139Handle EVNT->EVNT jobs in CI and logParser2021-06-17T13:30:43+02:00Spyros ArgyropoulosHandle EVNT->EVNT jobs in CI and logParserThese jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
-...These jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
- [x] Test with `700267`S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/138Check for multiple instances of TestHepMC (and TestLHE?)2021-04-01T14:20:41+02:00Christian GutschowCheck for multiple instances of TestHepMC (and TestLHE?)In general, the transform will create an instance of TestHepMC (and in the future also TestLHE) and run some checks as part of the job. For some setups the default thresholds used in these packages may be too strict and occasionally we g...In general, the transform will create an instance of TestHepMC (and in the future also TestLHE) and run some checks as part of the job. For some setups the default thresholds used in these packages may be too strict and occasionally we get JOs that try to loosen them a bit, which is usually fine.
We recently had a case (!1066) where a fresh instance of TestHepMC was created, and the threshold were tweaked on the new instance but not the one that the transform had already created, which was then causing issues down the line.
Could we catch this sort of thing in the CI? I imagine it would just be a case of checking for a line like
```
genSeq += TestHepMC()
```
and throwing an error?S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/135Sanity check for EVNT-to-EVNT transforms2021-06-17T11:07:17+02:00Christian GutschowSanity check for EVNT-to-EVNT transformsHi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the ...Hi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the event if it passes some Athena filter, hence most of the logic being protected by the `if runArgs.trfSubstepName == 'afterburn':` statement.
Now, because it copies the original EVNT, the new EVNT would have the MC channel number (or run number in the HepMC GenEvent) set to the original DSID and not the new DSID (of the E2E transform JO).
This can now be patched using the `postSeq.CountHepMC.CorrectRunNumber = True` flag seen at the bottom. Could we use the CI to catch cases where such a JO is being added, but that tag is missing from the JO?
(In principle, there is a printout in the `log.afterburn` produced by an E2E transform which one could grep for, but the CI doesn't handle jobs without input EVNT files yet.)
Thoughts/ideas?S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/133scripts/commit_new_dsid.sh crashes when reading directories2021-01-15T11:20:37+01:00Petr Jackascripts/commit_new_dsid.sh crashes when reading directories```
./scripts/commit_new_dsid.sh Var3c/* -m="Message" --dry-run
```
It fails when it tries to convert JOs directories inside Var3c directory with the message:
```
Traceback (most recent call last):
File "scripts/jo_utils.py", line 8...```
./scripts/commit_new_dsid.sh Var3c/* -m="Message" --dry-run
```
It fails when it tries to convert JOs directories inside Var3c directory with the message:
```
Traceback (most recent call last):
File "scripts/jo_utils.py", line 87, in <module>
_parse(args.DSIDs)
File "scripts/jo_utils.py", line 10, in _parse
dsids = [ int(d) for d in dsids ] # turn strings to integers
File "scripts/jo_utils.py", line 10, in <listcomp>
dsids = [ int(d) for d in dsids ] # turn strings to integers
ValueError: invalid literal for int() with base 10: 'Var3c/py8_yprod_var3cDown'
```
This issue was introduced in this commit: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/commit/2546cd6015fd7a1b95ebfeafa31613c1645e421a
It is still possible to run the script when directories are renamed into dummy dsid numbers
./scripts/commit_new_dsid.sh -d=100000,100001 -m="Adding ttgamma MG+Py8 Var3c variation samples" --dry-run
I attached a tar file with Var3c directory.
[Var3c.tar.gz](/uploads/2d6676234f4a093b0b06806f8e4e3196/Var3c.tar.gz)S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/130Support for Centos7 releases2021-01-10T17:33:29+01:00Christian GutschowSupport for Centos7 releasesStarting with release 21.6.51, the releases are built for Centos7 machines and so we should not be using SLC6 containers in the CI for those anymore (and gridpacks prepared on C7 machines are fine to use for those releases).Starting with release 21.6.51, the releases are built for Centos7 machines and so we should not be using SLC6 containers in the CI for those anymore (and gridpacks prepared on C7 machines are fine to use for those releases).S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/129Improve handling of madgraph checks2021-01-01T18:32:48+01:00Spyros ArgyropoulosImprove handling of madgraph checksInstead of reading the whole file for the madgraphchecks make use of appropriate dictionary, where values can be overwritten.
ATLMCPROD-8252Instead of reading the whole file for the madgraphchecks make use of appropriate dictionary, where values can be overwritten.
ATLMCPROD-8252S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/128evgen keywords not always being checked2021-01-14T18:12:46+01:00Christian Gutschowevgen keywords not always being checkedJust stumbled across this by accident:
The key words listed in `evgenConfig.keywords` should match the ones in the [official list](https://gitlab.cern.ch/atlas/athena/-/blob/21.6/Generators/EvgenJobTransforms/share/file/evgenkeywords.tx...Just stumbled across this by accident:
The key words listed in `evgenConfig.keywords` should match the ones in the [official list](https://gitlab.cern.ch/atlas/athena/-/blob/21.6/Generators/EvgenJobTransforms/share/file/evgenkeywords.txt). It turns out that when the transform doesn't find the official list in the JobOptions search path for some reason, it will be unable to check for potential mismatches and hence also not be able to print an error message.
If there's an undefined key word, the transform _should_ print:
```
msg = "evgenConfig.keywords contains non-standard keywords: %s. " % ", ".join(evil_keywords)
msg += "Please check the allowed keywords list and fix."
```
but if it cannot find the standard list it just says
```
08:29:01 Py:Gen_tf WARNING Could not find evgenkeywords.txt file EvgenJobTransforms/evgenkeywords.txt in $JOBOPTSEARCHPATH
```
in the log and the CI continues happily, see example log here:
```
/eos/atlas/atlascerngroupdisk/phys-gener/WeakBoson/SingleBoson/log/log.generate
```
Could we get the logParser to perform the check as well?S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/124New Pythia 8 checks for changing parameters2023-10-26T16:13:33+02:00Spyros ArgyropoulosNew Pythia 8 checks for changing parametersImplement code to use new developments by Giancarlo mentioned in AGENE-1915.
- [ ] To be seen which of these should result in an error and which should be a warning.
- [ ] Also check if this catches the bug reported in ATLMCPROD-7723Implement code to use new developments by Giancarlo mentioned in AGENE-1915.
- [ ] To be seen which of these should result in an error and which should be a warning.
- [ ] Also check if this catches the bug reported in ATLMCPROD-7723S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/104Harmonise whitelist with Gen_tf2021-01-04T15:27:52+01:00Spyros ArgyropoulosHarmonise whitelist with Gen_tfCurrently the transform allows setups which are explicitly excluded in the whitelist, e.g. `DSID/dat/*.dat` which is excluded here: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/whitelist.sh#L9 as discussed ...Currently the transform allows setups which are explicitly excluded in the whitelist, e.g. `DSID/dat/*.dat` which is excluded here: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/whitelist.sh#L9 as discussed in !298
I no longer remember why we excluded some cases but we should definitely harmonise what is done in the transform and what is done in the CI.
@ewelina could you go through the whitelist and let me know what is treated differently there and in `Gen_tf` so that we harmonise?
Tag @cgutscho @fsiegertS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/92logParser: catch cases where LHE events are not enough (low efficiency)2021-06-17T13:30:43+02:00Spyros ArgyropouloslogParser: catch cases where LHE events are not enough (low efficiency)* [x] **OTF generation**: check that `N(LHE events) >= 1.1* nEventsPerJob/(filter efficiency)`
* [x] **Showering with external LHE events**: same as above
* [x] This might require different code for each generator? To be checked
* [x]...* [x] **OTF generation**: check that `N(LHE events) >= 1.1* nEventsPerJob/(filter efficiency)`
* [x] **Showering with external LHE events**: same as above
* [x] This might require different code for each generator? To be checked
* [x] Might need to take into account how many input LHE files are used per jobS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/64Allow CI to run with external LHE inputs2021-04-27T12:14:25+02:00Spyros ArgyropoulosAllow CI to run with external LHE inputsThe CI can currently not yet run jobs with input event files. Largely the same tasks appear for both LHE->EVNT and EVNT->EVNT files:
* [x] Upload mcgensvc grid certificate in CI and make sure `voms-proxy-init -voms atlas` works in the CI...The CI can currently not yet run jobs with input event files. Largely the same tasks appear for both LHE->EVNT and EVNT->EVNT files:
* [x] Upload mcgensvc grid certificate in CI and make sure `voms-proxy-init -voms atlas` works in the CI
* [x] Write (rucio) file that was used in local testing into `log.generate.short`, e.g. `inputGeneratorFile=TXT.<dsid>.tar.gz` for LHE or `inputEVNTFile=1231231.EVNT.pool.root` for EVNT input.
* [x] If input file is specified in `log.generate.short`, the CI `run_athena.sh` job should `rucio get` that file and add the corresponding arguments to the `Gen_tf.py` command line as described in [Twiki](https://twiki.cern.ch/twiki/bin/view/AtlasProtected/SpecialConfigurations#Using_event_input_LHE_EVNT_or_EV).
* [x] Special treatment for EVNT->EVNT jobs: Apparently such jobs do not produce a `log.generate` file but a `log.afterburn`. Disable logParser for these, or special treatment? (move to #139)
* [ ] Ideally, the CI should also check the `inputFilesPerJob` in the JO for reasonable values at this stage: it should not exceed `10GB/sizeof(downloadedTestFile)` to make sure they fit into a grid node.
* [x] Similar case that came up in #89 is running athena with `--inputGeneratorFile`
Example !203
LHE-only log: `~sargyrop/public/log.generate_LHE`S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/57logParser gives missing weights for LHE-only geneation2021-05-08T12:30:45+02:00Spyros ArgyropouloslogParser gives missing weights for LHE-only geneationReported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 ...Reported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 with 21.6.12,AthGeneration) [1] and looked at the weight variations using truth derivation. But using logParser.py to check log.generate, it says “weights missing”.
Probably what happens is that the weights in logParser are read from lines that look like this:
09:33:22 MetaData: weights = MUR=0.5 MUF=0.5 | PDF=260000 MemberID=1
and for LHE-only generation this does not get written out.
Need to confirm and perhaps fix for LHE only generationS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/12Dynamical job creation for multiple DSID commits2021-01-10T17:33:29+01:00Spyros ArgyropoulosDynamical job creation for multiple DSID commitsWhen we run athena the job will check if the expected time is > 1h and will abort if it is.
If several DSIDs are added simultaneously (e.g. 10-30 DSIDs for SUSY, EXO signal grids) the `run_athena` job will run for a very long time.
We ...When we run athena the job will check if the expected time is > 1h and will abort if it is.
If several DSIDs are added simultaneously (e.g. 10-30 DSIDs for SUSY, EXO signal grids) the `run_athena` job will run for a very long time.
We should find a workaround for that. Could be handled with the `commit_new_dsid.sh` script making several branches.
Some interesting material here: https://gitlab.com/gitlab-org/gitlab-ce/issues/45828
In particular there's a feature request for dynamic CI jobs: https://gitlab.com/gitlab-org/gitlab-ce/issues/44199 probably to be implemented in early 2020.
**Update**: this seems to be exactly what we need: https://gitlab.com/gitlab-org/gitlab/issues/35632S1.2021Spyros ArgyropoulosSpyros Argyropoulos