MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2020-09-06T13:46:23+02:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/122Bug: handling of jobs with external LHE file in logParser step2020-09-06T13:46:23+02:00Spyros ArgyropoulosBug: handling of jobs with external LHE file in logParser stepWhen external LHE files are used `log.generate.short` is added to the commit but `run_athena` just skips the job without producing any `log.generate_ci` file. Then the `check_logParser` job thinks this is a bug because if `log.generate.s...When external LHE files are used `log.generate.short` is added to the commit but `run_athena` just skips the job without producing any `log.generate_ci` file. Then the `check_logParser` job thinks this is a bug because if `log.generate.short` is present `log.generate_ci` should also be present as well at this point in the CI and complains see !652S2.2020Spyros ArgyropoulosSpyros Argyropoulos2020-09-04https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/98Extraction of nEventsPerJob from file2020-07-27T13:10:20+02:00Spyros ArgyropoulosExtraction of nEventsPerJob from file### Bug
```
nEventsPerJob=5000 -> this is what the logParser reads from the file
# I can change it here
if (efficiencyLow) nEventsPerJob*=2
evgenConfig.nEventsPerJob = nEventsPerJob -> this is what the transform will use
```
### Solut...### Bug
```
nEventsPerJob=5000 -> this is what the logParser reads from the file
# I can change it here
if (efficiencyLow) nEventsPerJob*=2
evgenConfig.nEventsPerJob = nEventsPerJob -> this is what the transform will use
```
### Solution
Credits to Frank Sauerburger
Put the following in `scriptB.py`
```
import argparse
def readParamFromJO(jOpath, param):
locals = {"evgenConfig": argparse.Namespace()}
with open(jOpath) as jOFile:
for line in jOFile.readlines():
if "os.system" in line: continue # for security
try:
exec(line, {}, locals)
except:
# print(f"fail to parse {line}") # uncomment for debugging
pass
return getattr(locals["evgenConfig"], param) if hasattr(locals["evgenConfig"], param) else None
jOFile="./source/mc.scriptA.py"
nEventsPerJob=readParamFromJO(jOFile, 'nEventsPerJob')
# Check nEventsPerJob
if nEventsPerJob is None:
print(f"WARNING: evgenConfig.nEventsPerJob is not defined in the jO. Will set to default=10000")
nEventsPerJob=10000
else:
print(f"nEventsPerJob from jO={nEventsPerJob}")
# Check minEvents
if readParamFromJO(jOFile, 'minEvents') is not None:
print(f"ERROR: {jOFile} is using deprecated parameter evgenConfig.minEvents. Please switch to evgenConfig.nEventsPerJob")
```
### Testing:
Put the following in `./source/mc.scriptA.py`
```
import Sherpa_i.Sherpa_iConf
import os
import GeneratorFilters.GeneratorFiltersConf
include("./scriptA.py") # this doesn't work because python doesn't know what include is
evgenConfig.XVAR=5
filtSeq.YVAR=10
evgenConfig.nEventsPerJob=1
evgenConfig.nEventsPerJob=2
evgenConfig.nEventsPerJob*=3
evgenConfig.nEventsPerJob=os.system("rm test")
#evgenConfig.nEventsPerJob=10
#print(f"{evgenConfig.nEventsPerJob}")
```
Running `python3 scriptB.py` gives
```
fail to parse import Sherpa_i.Sherpa_iConf
fail to parse import GeneratorFilters.GeneratorFiltersConf
fail to parse include("./scriptA.py") # this doesn't work because python doesn't know what include is
fail to parse filtSeq.YVAR=10
Final Answer: nEventsPerJob=6
```
The added bonus is that if there is no `evgenConfig.nEventsPerJob` defined this would automatically throw an error.
## What is done in ProdSys
The first occurence of `evgenConfig.nEventsPerJob` is usedS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/115Wrong printing of branches using a DSID2020-08-01T16:41:21+02:00Spyros ArgyropoulosWrong printing of branches using a DSIDI had a wrong error message when I tried to commit JOs for 421332:
the message I got was that dsid_jveatch_600076 already uses this DSID.
I have checked this branch and it was not the case.
I found that this DSID was used in one of the e...I had a wrong error message when I tried to commit JOs for 421332:
the message I got was that dsid_jveatch_600076 already uses this DSID.
I have checked this branch and it was not the case.
I found that this DSID was used in one of the earlier branches awaiting approval.
I think the problem is that the list of branches
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_jo_consistency.py#L118
is ordered from the newest branch to the oldest and when a new branch is submitted for merging it is updated for the changes that were introduced in other branches awaiting the approval - this way always the newest one will be pointed as the one using already a given DSID (in case of conflict).S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/131LogParser fails to pickup nevents keyword2020-12-09T18:06:20+01:00Christian GutschowLogParser fails to pickup nevents keywordFrom @avroy:
```
When trying to check some new JOs that were generated using 21.6.54 and cc7, the logPArser failed with the following error
Traceback (most recent call last):
File "scripts/logParser.py", line 296, in madgraphChecks
...From @avroy:
```
When trying to check some new JOs that were generated using 21.6.54 and cc7, the logPArser failed with the following error
Traceback (most recent call last):
File "scripts/logParser.py", line 296, in madgraphChecks
neventsMG=int(float(generatorDict['"nevents"'][0]))
IndexError: list index out of range
I think the error is associated with the fact that in the new log file, the keyword is logged as nevents (i.e. without the quotes). You can find the log file in the uploaded zipball in https://its.cern.ch/jira/browse/ATLMCPROD-8926
Please look at JOs/200xxx/200001/log.generate
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/125JO shouldn't hardcode ATHENA_PROC_NUMBER2020-11-14T13:51:07+01:00Christian GutschowJO shouldn't hardcode ATHENA_PROC_NUMBERThe environment variable for multi-threading `ATHENA_PROC_NUMBER` should be set by prodsys, not the JOs.
Can we make the CI fail if the JOs try to assign a value to that? (The JO are free to ask if this environment variable exists and w...The environment variable for multi-threading `ATHENA_PROC_NUMBER` should be set by prodsys, not the JOs.
Can we make the CI fail if the JOs try to assign a value to that? (The JO are free to ask if this environment variable exists and what it's value is (e.g. to pass it into Madgraph), but they shouldn't try to overwrite its value
See e.g. MR !745 where this had to be corrected, but e.g. [this JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/421xxx/421006/mc.MGPy8EG_A14NNPDF23_tWgamma_art.py) where it's used in an acceptable way.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/121logParser rejects logs with nEventsPerJob > 10k2020-08-28T16:51:47+02:00Christian GutschowlogParser rejects logs with nEventsPerJob > 10kFollowing the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable n...Following the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable number of events for production (1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000)
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/120Allow runArgs to be referred to in JOs but not to be overwritten by JOs2020-08-22T13:08:09+02:00Christian GutschowAllow runArgs to be referred to in JOs but not to be overwritten by JOsSee !631 for an example.See !631 for an example.S2.2020Spyros ArgyropoulosSpyros Argyropoulos2020-08-14https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/118Add checks for `inputfilecheck` and `inputGeneratorFile`2020-08-03T10:25:32+02:00Christian GutschowAdd checks for `inputfilecheck` and `inputGeneratorFile`Please see this test commit: 52aa8087
which has the following two lines in the JO:
```
evgenConfig.inputfilecheck = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee'
runArgs.inputGeneratorFile = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee._00052.events.tar.gz'
```
Th...Please see this test commit: 52aa8087
which has the following two lines in the JO:
```
evgenConfig.inputfilecheck = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee'
runArgs.inputGeneratorFile = 'PhPy8EG_NNPDF30LO_EWK_ZZeeee._00052.events.tar.gz'
```
The first one I thought the CI would already be catching [along with `inputconfcheck`, no?] and the second one is clearly a problem for central production.
Can we catch these? I guess the logParser should already throw an error before the files are even committed to gitlab.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/117Check number of files in gridpack2023-03-01T07:42:53+01:00Christian GutschowCheck number of files in gridpackThe number of files in a gridpack shouldn't exceed 80k, otherwise some grid sites will crash. This has happened a number of times recently, e.g. for the FxFx job where the gridpack contained several files per Feynman diagram. MadGraph co...The number of files in a gridpack shouldn't exceed 80k, otherwise some grid sites will crash. This has happened a number of times recently, e.g. for the FxFx job where the gridpack contained several files per Feynman diagram. MadGraph control cleans up logs and .o files in the latest release, but for older releases it would be good to have a dedicated pipeline step that throws an error if the number of files in the gridpack is larger than 80k. Probably something like `tar -ztvf *.tgz *.tar.gz` could work?S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/113Improve `check_modified_files` behaviour2020-08-04T10:02:29+02:00Spyros ArgyropoulosImprove `check_modified_files` behaviourDo a local rebase before checking what changed to avoid failed pipelines for commits that are behind master.Do a local rebase before checking what changed to avoid failed pipelines for commits that are behind master.S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/102Add checks for input files2020-07-30T07:37:56+02:00Spyros ArgyropoulosAdd checks for input filesAdd checks:
* [ ] no `evgenConfig.inputfilecheck`
* [ ] no `evgenConfig.inputconfcheck` allowed
both are always in the top JO
Also
* [ ] Restructure checks so that everything related to reading the jO is done in one place and everyt...Add checks:
* [ ] no `evgenConfig.inputfilecheck`
* [ ] no `evgenConfig.inputconfcheck` allowed
both are always in the top JO
Also
* [ ] Restructure checks so that everything related to reading the jO is done in one place and everything related to reading the log is done in `logParser`S2.2020https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/95Add check to see if gridpack was used and if the grid pack is provided2020-08-04T13:43:36+02:00Spyros ArgyropoulosAdd check to see if gridpack was used and if the grid pack is provided
I was wondering how to catch such cases and avoid having pipeline jobs running for 1h and failing without apparent reason. We would need an indicator in log.generate that a gridpack was used.
I don't see PowhegConfig.gridpack printed i...
I was wondering how to catch such cases and avoid having pipeline jobs running for 1h and failing without apparent reason. We would need an indicator in log.generate that a gridpack was used.
I don't see PowhegConfig.gridpack printed in the log that Olga provided. I see
```
16:47:17 Py:PowhegControl INFO | powheginput keyword use-old-grid set to 1.0000000000000000
Does this tell us whether a gridpack was used?
```
Comment by @fsiegert
> Hi @sargyrop,
I think there are things which we'll never be able to catch if requesters modify the DSID directory before submitting but after having run the evgen test. This is not only relevant for gridpacks, but also potentially removing include files etc. So I wouldn't put too much effort into catching these cases if it's not easy.
We just need to educate users that they:
run the evgen test in a clean working directory
should not modify the DSID directory before submission
Best,
Frank
I think this is a pretty straightforward check: if ((gridpack used) && ! (gridpack present)) then ERROR So I am only asking how to specify (gridpack used)
Comment by @amoroso :
> Hi @fsiegert, @sargyrop,
I wonder if we couldn't catch case 2 within the CI. We could add a checksum to the DSID directory to the Gen_tf output, and have a pipeline check that the checksum in the attached logfile and the one recomputed by the CI are the same.
cheers, Simone
## Solution for Madgraph
GRID presence can be identified by lines like:
```
06:17:07 Py:MadGraphUtils INFO Generating events from gridpack
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/84Allow arbitrary directory names in commit script2020-08-10T09:50:05+02:00Spyros ArgyropoulosAllow arbitrary directory names in commit script* [ ] allow possibility to use something like `./scripts/commit_new_dsid.sh -n ../myintegrations/zjets/zee*var{1,2,0p5}`
* [x] when the above is implemented, add option to move dummy DSIDs to final DSID
* [ ] Need to think also what w...* [ ] allow possibility to use something like `./scripts/commit_new_dsid.sh -n ../myintegrations/zjets/zee*var{1,2,0p5}`
* [x] when the above is implemented, add option to move dummy DSIDs to final DSID
* [ ] Need to think also what we do if someone already picks a directory in the correct range. The scripts should make sure that it's the lowest possibleS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/66Modify logParser/CI to handle cases with no TestHepMC results2020-08-17T10:40:40+02:00Spyros ArgyropoulosModify logParser/CI to handle cases with no TestHepMC resultsAs mentioned here https://its.cern.ch/jira/browse/ATLHI-297 there might be cases where the requirement of TestHepMC results in logParser blocks a production.
@olszewsk I would need a log.generate file to provide a solutionAs mentioned here https://its.cern.ch/jira/browse/ATLHI-297 there might be cases where the requirement of TestHepMC results in logParser blocks a production.
@olszewsk I would need a log.generate file to provide a solutionS2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/53Handle LHE only generation in logParser and CI2020-09-24T17:49:51+02:00Spyros ArgyropoulosHandle LHE only generation in logParser and CI> I just tried to commit a few JOs for LHE-only production using Gen_tf.
It seems that the logParser doesn't really recognise this,
as it complaints about many things related to the shower.
Can the checks below be removed if outputTXT i...> I just tried to commit a few JOs for LHE-only production using Gen_tf.
It seems that the logParser doesn't really recognise this,
as it complaints about many things related to the shower.
Can the checks below be removed if outputTXT is used?
This said I feel I have encountered a large enough number of issue
by trying to produce only LHE events, that I am not sure this will be a very useful/used feature.
cheers, Simone
ERROR: generatorTune is missing!
Failed tests:
ERROR: TestHepMC Events passed is missing!
ERROR: TestHepMC Efficiency is missing!
WARNING: SimTimeEstimate RUN INFORMATION is missing!
- Total no. of events: 1 <-- WARNING: This total is low enough that the mu profile may be problematic - INFORM MC PROD
Logs in `/afs/cern.ch/user/a/amoroso/public/PowhegEWintegrations/600001`S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/144Pipelines failing when only links are included?2021-06-21T16:50:31+02:00Spyros ArgyropoulosPipelines failing when only links are included?The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> I...The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> Indeed it is a duplicate, but this is by design in order to clear the CI. To give some context, these JOs are for a SUSY grid expansion.
>
> I originally tried to upload everything using only symlinks to that control file, but the CI pipelines were failing, claiming that the jobs couldn't find ```MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```
>
> So I duplicated the control file you pointed to and included it in this MR so that the pipelines would succeed. After the MR gets accepted, I was going to make another one where I change all the control files to be symlinks to ```/502xxx/502416/MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```. That way, there would be no duplicated control files floating around.
>
> I realize this is remarkably convoluted, so I'm more than happy to hear other ideas about preparing the JOs for grid expansions in R21.
>
> Cheers,
> Jeff
Failed pipeline: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/2741834
![Screenshot_2021-06-21_at_14.50.38](/uploads/1b1ebf50941d6c15803a23b2ad2bcd32/Screenshot_2021-06-21_at_14.50.38.png)S1.2021Spyros ArgyropoulosSpyros Argyropoulos2021-06-27https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/139Handle EVNT->EVNT jobs in CI and logParser2021-06-17T13:30:43+02:00Spyros ArgyropoulosHandle EVNT->EVNT jobs in CI and logParserThese jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
-...These jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
- [x] Test with `700267`S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/133scripts/commit_new_dsid.sh crashes when reading directories2021-01-15T11:20:37+01:00Petr Jackascripts/commit_new_dsid.sh crashes when reading directories```
./scripts/commit_new_dsid.sh Var3c/* -m="Message" --dry-run
```
It fails when it tries to convert JOs directories inside Var3c directory with the message:
```
Traceback (most recent call last):
File "scripts/jo_utils.py", line 8...```
./scripts/commit_new_dsid.sh Var3c/* -m="Message" --dry-run
```
It fails when it tries to convert JOs directories inside Var3c directory with the message:
```
Traceback (most recent call last):
File "scripts/jo_utils.py", line 87, in <module>
_parse(args.DSIDs)
File "scripts/jo_utils.py", line 10, in _parse
dsids = [ int(d) for d in dsids ] # turn strings to integers
File "scripts/jo_utils.py", line 10, in <listcomp>
dsids = [ int(d) for d in dsids ] # turn strings to integers
ValueError: invalid literal for int() with base 10: 'Var3c/py8_yprod_var3cDown'
```
This issue was introduced in this commit: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/commit/2546cd6015fd7a1b95ebfeafa31613c1645e421a
It is still possible to run the script when directories are renamed into dummy dsid numbers
./scripts/commit_new_dsid.sh -d=100000,100001 -m="Adding ttgamma MG+Py8 Var3c variation samples" --dry-run
I attached a tar file with Var3c directory.
[Var3c.tar.gz](/uploads/2d6676234f4a093b0b06806f8e4e3196/Var3c.tar.gz)S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/130Support for Centos7 releases2021-01-10T17:33:29+01:00Christian GutschowSupport for Centos7 releasesStarting with release 21.6.51, the releases are built for Centos7 machines and so we should not be using SLC6 containers in the CI for those anymore (and gridpacks prepared on C7 machines are fine to use for those releases).Starting with release 21.6.51, the releases are built for Centos7 machines and so we should not be using SLC6 containers in the CI for those anymore (and gridpacks prepared on C7 machines are fine to use for those releases).S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/129Improve handling of madgraph checks2021-01-01T18:32:48+01:00Spyros ArgyropoulosImprove handling of madgraph checksInstead of reading the whole file for the madgraphchecks make use of appropriate dictionary, where values can be overwritten.
ATLMCPROD-8252Instead of reading the whole file for the madgraphchecks make use of appropriate dictionary, where values can be overwritten.
ATLMCPROD-8252S1.2021Spyros ArgyropoulosSpyros Argyropoulos