MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2020-11-14T13:51:07+01:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/125JO shouldn't hardcode ATHENA_PROC_NUMBER2020-11-14T13:51:07+01:00Christian GutschowJO shouldn't hardcode ATHENA_PROC_NUMBERThe environment variable for multi-threading `ATHENA_PROC_NUMBER` should be set by prodsys, not the JOs.
Can we make the CI fail if the JOs try to assign a value to that? (The JO are free to ask if this environment variable exists and w...The environment variable for multi-threading `ATHENA_PROC_NUMBER` should be set by prodsys, not the JOs.
Can we make the CI fail if the JOs try to assign a value to that? (The JO are free to ask if this environment variable exists and what it's value is (e.g. to pass it into Madgraph), but they shouldn't try to overwrite its value
See e.g. MR !745 where this had to be corrected, but e.g. [this JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/421xxx/421006/mc.MGPy8EG_A14NNPDF23_tWgamma_art.py) where it's used in an acceptable way.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/123logParser picks out wrong COM energy2020-09-21T16:08:05+02:00Christian GutschowlogParser picks out wrong COM energySee e.g. !676 where it extracted `ecmEnergy = 13000` even though the `log.generate` was for 8 TeV:
```
/afs/cern.ch/user/c/cgutscho/public/forSpyros/log.generate
```
Why though?See e.g. !676 where it extracted `ecmEnergy = 13000` even though the `log.generate` was for 8 TeV:
```
/afs/cern.ch/user/c/cgutscho/public/forSpyros/log.generate
```
Why though?S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/122Bug: handling of jobs with external LHE file in logParser step2020-09-06T13:46:23+02:00Spyros ArgyropoulosBug: handling of jobs with external LHE file in logParser stepWhen external LHE files are used `log.generate.short` is added to the commit but `run_athena` just skips the job without producing any `log.generate_ci` file. Then the `check_logParser` job thinks this is a bug because if `log.generate.s...When external LHE files are used `log.generate.short` is added to the commit but `run_athena` just skips the job without producing any `log.generate_ci` file. Then the `check_logParser` job thinks this is a bug because if `log.generate.short` is present `log.generate_ci` should also be present as well at this point in the CI and complains see !652S2.2020Spyros ArgyropoulosSpyros Argyropoulos2020-09-04https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/121logParser rejects logs with nEventsPerJob > 10k2020-08-28T16:51:47+02:00Christian GutschowlogParser rejects logs with nEventsPerJob > 10kFollowing the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable n...Following the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable number of events for production (1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000)
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/120Allow runArgs to be referred to in JOs but not to be overwritten by JOs2020-08-22T13:08:09+02:00Christian GutschowAllow runArgs to be referred to in JOs but not to be overwritten by JOsSee !631 for an example.See !631 for an example.S2.2020Spyros ArgyropoulosSpyros Argyropoulos2020-08-14https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/118Add checks for `inputfilecheck` and `inputGeneratorFile`2020-08-03T10:25:32+02:00Christian GutschowAdd checks for `inputfilecheck` and `inputGeneratorFile`Please see this test commit: 52aa8087
which has the following two lines in the JO:
```
evgenConfig.inputfilecheck = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee'
runArgs.inputGeneratorFile = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee._00052.events.tar.gz'
```
Th...Please see this test commit: 52aa8087
which has the following two lines in the JO:
```
evgenConfig.inputfilecheck = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee'
runArgs.inputGeneratorFile = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee._00052.events.tar.gz'
```
The first one I thought the CI would already be catching [along with `inputconfcheck`, no?] and the second one is clearly a problem for central production.
Can we catch these? I guess the logParser should already throw an error before the files are even committed to gitlab.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/117Check number of files in gridpack2023-03-01T07:42:53+01:00Christian GutschowCheck number of files in gridpackThe number of files in a gridpack shouldn't exceed 80k, otherwise some grid sites will crash. This has happened a number of times recently, e.g. for the FxFx job where the gridpack contained several files per Feynman diagram. MadGraph co...The number of files in a gridpack shouldn't exceed 80k, otherwise some grid sites will crash. This has happened a number of times recently, e.g. for the FxFx job where the gridpack contained several files per Feynman diagram. MadGraph control cleans up logs and .o files in the latest release, but for older releases it would be good to have a dedicated pipeline step that throws an error if the number of files in the gridpack is larger than 80k. Probably something like `tar -ztvf *.tgz *.tar.gz` could work?S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/115Wrong printing of branches using a DSID2020-08-01T16:41:21+02:00Spyros ArgyropoulosWrong printing of branches using a DSIDI had a wrong error message when I tried to commit JOs for 421332:
the message I got was that dsid_jveatch_600076 already uses this DSID.
I have checked this branch and it was not the case.
I found that this DSID was used in one of the e...I had a wrong error message when I tried to commit JOs for 421332:
the message I got was that dsid_jveatch_600076 already uses this DSID.
I have checked this branch and it was not the case.
I found that this DSID was used in one of the earlier branches awaiting approval.
I think the problem is that the list of branches
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_jo_consistency.py#L118
is ordered from the newest branch to the oldest and when a new branch is submitted for merging it is updated for the changes that were introduced in other branches awaiting the approval - this way always the newest one will be pointed as the one using already a given DSID (in case of conflict).S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/113Improve `check_modified_files` behaviour2020-08-04T10:02:29+02:00Spyros ArgyropoulosImprove `check_modified_files` behaviourDo a local rebase before checking what changed to avoid failed pipelines for commits that are behind master.Do a local rebase before checking what changed to avoid failed pipelines for commits that are behind master.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/102Add checks for input files2020-07-30T07:37:56+02:00Spyros ArgyropoulosAdd checks for input filesAdd checks:
* [ ] no `evgenConfig.inputfilecheck`
* [ ] no `evgenConfig.inputconfcheck` allowed
both are always in the top JO
Also
* [ ] Restructure checks so that everything related to reading the jO is done in one place and everyt...Add checks:
* [ ] no `evgenConfig.inputfilecheck`
* [ ] no `evgenConfig.inputconfcheck` allowed
both are always in the top JO
Also
* [ ] Restructure checks so that everything related to reading the jO is done in one place and everything related to reading the log is done in `logParser`S2.2020https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/98Extraction of nEventsPerJob from file2020-07-27T13:10:20+02:00Spyros ArgyropoulosExtraction of nEventsPerJob from file### Bug
```
nEventsPerJob=5000 -> this is what the logParser reads from the file
# I can change it here
if (efficiencyLow) nEventsPerJob*=2
evgenConfig.nEventsPerJob = nEventsPerJob -> this is what the transform will use
```
### Solut...### Bug
```
nEventsPerJob=5000 -> this is what the logParser reads from the file
# I can change it here
if (efficiencyLow) nEventsPerJob*=2
evgenConfig.nEventsPerJob = nEventsPerJob -> this is what the transform will use
```
### Solution
Credits to Frank Sauerburger
Put the following in `scriptB.py`
```
import argparse
def readParamFromJO(jOpath, param):
locals = {"evgenConfig": argparse.Namespace()}
with open(jOpath) as jOFile:
for line in jOFile.readlines():
if "os.system" in line: continue # for security
try:
exec(line, {}, locals)
except:
# print(f"fail to parse {line}") # uncomment for debugging
pass
return getattr(locals["evgenConfig"], param) if hasattr(locals["evgenConfig"], param) else None
jOFile="./source/mc.scriptA.py"
nEventsPerJob=readParamFromJO(jOFile, 'nEventsPerJob')
# Check nEventsPerJob
if nEventsPerJob is None:
print(f"WARNING: evgenConfig.nEventsPerJob is not defined in the jO. Will set to default=10000")
nEventsPerJob=10000
else:
print(f"nEventsPerJob from jO={nEventsPerJob}")
# Check minEvents
if readParamFromJO(jOFile, 'minEvents') is not None:
print(f"ERROR: {jOFile} is using deprecated parameter evgenConfig.minEvents. Please switch to evgenConfig.nEventsPerJob")
```
### Testing:
Put the following in `./source/mc.scriptA.py`
```
import Sherpa_i.Sherpa_iConf
import os
import GeneratorFilters.GeneratorFiltersConf
include("./scriptA.py") # this doesn't work because python doesn't know what include is
evgenConfig.XVAR=5
filtSeq.YVAR=10
evgenConfig.nEventsPerJob=1
evgenConfig.nEventsPerJob=2
evgenConfig.nEventsPerJob*=3
evgenConfig.nEventsPerJob=os.system("rm test")
#evgenConfig.nEventsPerJob=10
#print(f"{evgenConfig.nEventsPerJob}")
```
Running `python3 scriptB.py` gives
```
fail to parse import Sherpa_i.Sherpa_iConf
fail to parse import GeneratorFilters.GeneratorFiltersConf
fail to parse include("./scriptA.py") # this doesn't work because python doesn't know what include is
fail to parse filtSeq.YVAR=10
Final Answer: nEventsPerJob=6
```
The added bonus is that if there is no `evgenConfig.nEventsPerJob` defined this would automatically throw an error.
## What is done in ProdSys
The first occurence of `evgenConfig.nEventsPerJob` is usedS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/95Add check to see if gridpack was used and if the grid pack is provided2020-08-04T13:43:36+02:00Spyros ArgyropoulosAdd check to see if gridpack was used and if the grid pack is provided
I was wondering how to catch such cases and avoid having pipeline jobs running for 1h and failing without apparent reason. We would need an indicator in log.generate that a gridpack was used.
I don't see PowhegConfig.gridpack printed i...
I was wondering how to catch such cases and avoid having pipeline jobs running for 1h and failing without apparent reason. We would need an indicator in log.generate that a gridpack was used.
I don't see PowhegConfig.gridpack printed in the log that Olga provided. I see
```
16:47:17 Py:PowhegControl INFO | powheginput keyword use-old-grid set to 1.0000000000000000
Does this tell us whether a gridpack was used?
```
Comment by @fsiegert
> Hi @sargyrop,
I think there are things which we'll never be able to catch if requesters modify the DSID directory before submitting but after having run the evgen test. This is not only relevant for gridpacks, but also potentially removing include files etc. So I wouldn't put too much effort into catching these cases if it's not easy.
We just need to educate users that they:
run the evgen test in a clean working directory
should not modify the DSID directory before submission
Best,
Frank
I think this is a pretty straightforward check: if ((gridpack used) && ! (gridpack present)) then ERROR So I am only asking how to specify (gridpack used)
Comment by @amoroso :
> Hi @fsiegert, @sargyrop,
I wonder if we couldn't catch case 2 within the CI. We could add a checksum to the DSID directory to the Gen_tf output, and have a pipeline check that the checksum in the attached logfile and the one recomputed by the CI are the same.
cheers, Simone
## Solution for Madgraph
GRID presence can be identified by lines like:
```
06:17:07 Py:MadGraphUtils INFO Generating events from gridpack
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/84Allow arbitrary directory names in commit script2020-08-10T09:50:05+02:00Spyros ArgyropoulosAllow arbitrary directory names in commit script* [ ] allow possibility to use something like `./scripts/commit_new_dsid.sh -n ../myintegrations/zjets/zee*var{1,2,0p5}`
* [x] when the above is implemented, add option to move dummy DSIDs to final DSID
* [ ] Need to think also what w...* [ ] allow possibility to use something like `./scripts/commit_new_dsid.sh -n ../myintegrations/zjets/zee*var{1,2,0p5}`
* [x] when the above is implemented, add option to move dummy DSIDs to final DSID
* [ ] Need to think also what we do if someone already picks a directory in the correct range. The scripts should make sure that it's the lowest possibleS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/66Modify logParser/CI to handle cases with no TestHepMC results2020-08-17T10:40:40+02:00Spyros ArgyropoulosModify logParser/CI to handle cases with no TestHepMC resultsAs mentioned here https://its.cern.ch/jira/browse/ATLHI-297 there might be cases where the requirement of TestHepMC results in logParser blocks a production.
@olszewsk I would need a log.generate file to provide a solutionAs mentioned here https://its.cern.ch/jira/browse/ATLHI-297 there might be cases where the requirement of TestHepMC results in logParser blocks a production.
@olszewsk I would need a log.generate file to provide a solutionS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/53Handle LHE only generation in logParser and CI2020-09-24T17:49:51+02:00Spyros ArgyropoulosHandle LHE only generation in logParser and CI> I just tried to commit a few JOs for LHE-only production using Gen_tf.
It seems that the logParser doesn't really recognise this,
as it complaints about many things related to the shower.
Can the checks below be removed if outputTXT i...> I just tried to commit a few JOs for LHE-only production using Gen_tf.
It seems that the logParser doesn't really recognise this,
as it complaints about many things related to the shower.
Can the checks below be removed if outputTXT is used?
This said I feel I have encountered a large enough number of issue
by trying to produce only LHE events, that I am not sure this will be a very useful/used feature.
cheers, Simone
ERROR: generatorTune is missing!
Failed tests:
ERROR: TestHepMC Events passed is missing!
ERROR: TestHepMC Efficiency is missing!
WARNING: SimTimeEstimate RUN INFORMATION is missing!
- Total no. of events: 1 <-- WARNING: This total is low enough that the mu profile may be problematic - INFORM MC PROD
Logs in `/afs/cern.ch/user/a/amoroso/public/PowhegEWintegrations/600001`S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/144Pipelines failing when only links are included?2021-06-21T16:50:31+02:00Spyros ArgyropoulosPipelines failing when only links are included?The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> I...The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> Indeed it is a duplicate, but this is by design in order to clear the CI. To give some context, these JOs are for a SUSY grid expansion.
>
> I originally tried to upload everything using only symlinks to that control file, but the CI pipelines were failing, claiming that the jobs couldn't find ```MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```
>
> So I duplicated the control file you pointed to and included it in this MR so that the pipelines would succeed. After the MR gets accepted, I was going to make another one where I change all the control files to be symlinks to ```/502xxx/502416/MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```. That way, there would be no duplicated control files floating around.
>
> I realize this is remarkably convoluted, so I'm more than happy to hear other ideas about preparing the JOs for grid expansions in R21.
>
> Cheers,
> Jeff
Failed pipeline: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/2741834
![Screenshot_2021-06-21_at_14.50.38](/uploads/1b1ebf50941d6c15803a23b2ad2bcd32/Screenshot_2021-06-21_at_14.50.38.png)S1.2021Spyros ArgyropoulosSpyros Argyropoulos2021-06-27https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/143Follow-up from ATLMCPROD-93322021-06-07T14:15:29+02:00Spyros ArgyropoulosFollow-up from ATLMCPROD-9332The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might b...The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might be a long shot, but do you think this is something we can catch in the CI? e.g. if there's a variable in MadGraph JOs that has `gridpack` or `grid_pack` in the name and it's still set to `True`, we put out a warning ... ?
>
> Cheers,
> Chris
Make logParser fail if the info in Chris's message below appearsS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/142Follow-up from "LO EFT samples for 4top"2021-05-18T11:32:21+02:00Spyros ArgyropoulosFollow-up from "LO EFT samples for 4top"The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 ...The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 because the machine has 8 cores, it shouldn't be set to 80 in the JOs.
Should we add the following checks/changes:
- if ATHENA_PROC_NUMBER > 1 and release < 21.2.60 => ERROR
- if ATHENA_PROC_NUMBER > 1 => run only 1 event in CI
- change the way we check whether the jO changes ATHENA_PROC_NUMBER - this would only be safe to catch in the transform btw, but until it is implemented there we could change the check to not use anywhere ATHENA_PROC_NUMBER (not even printing it), so e.g. look in the jO and if there is an uncommented line with "ATHENA_PROC_NUMBER" in it then give error
@cgutschoS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/141check_unique_controlFile.sh fails when it shouldn't?2021-05-13T09:00:06+02:00Jeff Shahiniancheck_unique_controlFile.sh fails when it shouldn't?[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, whe...[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, when uploading JOs (with symlinks to one control file) that look like this:
```
$ ls -a *
100001:
myJO_1.py
myControlFile.py
100002:
myJO_2.py
myControlFile.py -> ../100001/myControlFile.py
```
The CI job fails and recommends that you use symlinks (even if you already are):
```
ERROR: Duplicate file(s) found:
./100xxx/100001/myControlFile.py
If the files have exactly the same content, please only keep one physical file replacing the rest with symbolic links.
If the files have differences consider renaming the files that you added.
You can check for differences with diff -w file1 file2
```
Perhaps we need to add ```-type f``` to [this line](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh#L23) as well?
Here's an example of a failing CI job:
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/13818890
Tagging @sargyrop
Best,
JeffS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/140Strange behaviour of commit script when athena is skipped and run time is > 1h2021-05-06T11:03:19+02:00Spyros ArgyropoulosStrange behaviour of commit script when athena is skipped and run time is > 1hFor example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athen...For example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athena:
![Screenshot_2021-05-06_at_09.45.42](/uploads/c01022a7990d31535fb7cca7aa2e6a4c/Screenshot_2021-05-06_at_09.45.42.png)
the
```
printGood -f "\tOK: CI job time estimate: $cpu hours, but athena will not run in the CI"
```
message is not printed because the script never reaches that point.S1.2021Spyros ArgyropoulosSpyros Argyropoulos