MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2021-12-03T14:57:38+01:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/155MR not being linked on JIRAs2021-12-03T14:57:38+01:00Matthew GignacMR not being linked on JIRAsIn some recent requests, it was noticed that the MRs are not being linked on JIRA. For example see: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1510In some recent requests, it was noticed that the MRs are not being linked on JIRA. For example see: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1510Spyros ArgyropoulosSpyros Argyropoulos2021-12-05https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/154CI cannot deal with integration grids for several ECM being present2021-11-16T21:10:54+01:00Jan KretzschmarCI cannot deal with integration grids for several ECM being presentI'm trying to commit new jO that come with integration grids. https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1485
As we want to run it at different sqrt(s) value, we need several of those. the Gen_tf.py transform ...I'm trying to commit new jO that come with integration grids. https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1485
As we want to run it at different sqrt(s) value, we need several of those. the Gen_tf.py transform works this out correctly.
However, the CI bails on trying to copy the GRID file, because it expects just a single file https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/run_athena.sh#L136
Probably the best solution would be to copy the right file for the srt(s) value in question that would be sth like `mc_${sqrts}TeV.*.GRID.tar.gz` instead of `*.GRID.tar.gz`.
And ${sqrts} should be either an integer like 5,7,8,13,14 or 8p16 or 13p6 for special values.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/152Skip the check for 10% extra LHE events in case of LHE-only jobs2021-11-10T11:51:30+01:00Jan KretzschmarSkip the check for 10% extra LHE events in case of LHE-only jobsHi,
In preparing some LHE-only jobs with MG (e.g. https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1462) I hit the issue that this check https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/lo...Hi,
In preparing some LHE-only jobs with MG (e.g. https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1462) I hit the issue that this check https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/logParser.py#L307-316 demands "10% more" LHE events. Obviously, in an LHE-only job this can never be passed, as the number of events requested is the number of events in the LHE file. Can this check be disabled? I guess the condition would be sth like "--outputTXTFile" present and "--outputEVNTFile" absent.
Looking at this again, I wonder if skipping this check in case of externally supplied LHE files actually makes sense as stated in the comment `# This check only makes sense if no external LHE inputs are used` - you'd normally want this to be checked also in this case?
Thanks, Jan
PS: for the above MR I circumvented the issue by hacking logParser locally to remove these lines(!)Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/151Support for multiple tarballs with different COM energy2021-10-12T15:31:16+02:00Christian GutschowSupport for multiple tarballs with different COM energyThis should not have crashed:
Job [#16844556](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/16844556) failed for 9efbabb68119e3a4a2bf19113c4d8f3ffeb2b9e9:
It used to be working, so not sure what's changed?This should not have crashed:
Job [#16844556](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/16844556) failed for 9efbabb68119e3a4a2bf19113c4d8f3ffeb2b9e9:
It used to be working, so not sure what's changed?Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/150Make CI job that sends email to conveners2021-09-30T13:43:46+02:00Spyros ArgyropoulosMake CI job that sends email to conveners- when commit message contains [skip modfiles]
- also when files are actually modified ? (we probably want this as well - some people add skip modfiles when there's no reason to)- when commit message contains [skip modfiles]
- also when files are actually modified ? (we probably want this as well - some people add skip modfiles when there's no reason to)Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/148use unweighted filter efficiency to calculate number of required input events2021-08-02T12:04:37+02:00Jan Kretzschmaruse unweighted filter efficiency to calculate number of required input eventsIn !1289 I had the issue, that the commit script appears to use the "weighted filter efficiency" to compute the number or needed input events as opposed to the "unweighted" one. This is not correct, take the example (attached) of highly-...In !1289 I had the issue, that the commit script appears to use the "weighted filter efficiency" to compute the number or needed input events as opposed to the "unweighted" one. This is not correct, take the example (attached) of highly-weighted input events, where the filter is removing preferrentially high-weight events, thus we get
Filter Efficiency = 0.570255 [10000 / 17536]
Weighted Filter Efficiency = 0.014686 [26912615882800.000000 / 1832511495909600.000000]
The relevant number to see if the job runs is the unweighted number, as this really tells us how many events need to pass.
[log.generate.gz](/uploads/b16f899af8781a4312d63c03ce12cf79/log.generate.gz)
Maybe a separate issue: it appears there is a blanket 10% safety applied. Note while I have not correctly calculated the right number, this can be both to little and too much and a safety of 4*sqrt[target output events]/[filter eff] would probably be better (this would be ~4sigma)
Example 1: number of output events is 10000, so "4sigma" would be ~400 events, or just 4%
Example 2: number of output events is 50, so "4sigma" would be ~14 events, or 14%Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/145Powheg-specific check needs adjusting2021-07-22T12:24:42+02:00Christian GutschowPowheg-specific check needs adjustingSeen in !1257: If the process is bb4l, i.e. it includes `PowhegControl/PowhegControl_bblvlv_Common.py`, then it won't need to include the Main31 include as well, which is what the logParser is currently checking.
cc @jkretzSeen in !1257: If the process is bb4l, i.e. it includes `PowhegControl/PowhegControl_bblvlv_Common.py`, then it won't need to include the Main31 include as well, which is what the logParser is currently checking.
cc @jkretzhttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/144Pipelines failing when only links are included?2021-06-21T16:50:31+02:00Spyros ArgyropoulosPipelines failing when only links are included?The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> I...The following discussion from !1225 should be addressed:
- [ ] @jshahini started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1225#note_4588898): (+1 comment)
> Hi @cgutscho
>
> Indeed it is a duplicate, but this is by design in order to clear the CI. To give some context, these JOs are for a SUSY grid expansion.
>
> I originally tried to upload everything using only symlinks to that control file, but the CI pipelines were failing, claiming that the jobs couldn't find ```MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```
>
> So I duplicated the control file you pointed to and included it in this MR so that the pipelines would succeed. After the MR gets accepted, I was going to make another one where I change all the control files to be symlinks to ```/502xxx/502416/MadGraphControl_SimplifiedModel_GG_directRPVLQD.py```. That way, there would be no duplicated control files floating around.
>
> I realize this is remarkably convoluted, so I'm more than happy to hear other ideas about preparing the JOs for grid expansions in R21.
>
> Cheers,
> Jeff
Failed pipeline: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/2741834
![Screenshot_2021-06-21_at_14.50.38](/uploads/1b1ebf50941d6c15803a23b2ad2bcd32/Screenshot_2021-06-21_at_14.50.38.png)S1.2021Spyros ArgyropoulosSpyros Argyropoulos2021-06-27https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/92logParser: catch cases where LHE events are not enough (low efficiency)2021-06-17T13:30:43+02:00Spyros ArgyropouloslogParser: catch cases where LHE events are not enough (low efficiency)* [x] **OTF generation**: check that `N(LHE events) >= 1.1* nEventsPerJob/(filter efficiency)`
* [x] **Showering with external LHE events**: same as above
* [x] This might require different code for each generator? To be checked
* [x]...* [x] **OTF generation**: check that `N(LHE events) >= 1.1* nEventsPerJob/(filter efficiency)`
* [x] **Showering with external LHE events**: same as above
* [x] This might require different code for each generator? To be checked
* [x] Might need to take into account how many input LHE files are used per jobS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/139Handle EVNT->EVNT jobs in CI and logParser2021-06-17T13:30:43+02:00Spyros ArgyropoulosHandle EVNT->EVNT jobs in CI and logParserThese jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
-...These jobs produce a `log.afterburn` instead of `log.generate`.
- [x] I would need an example to see how to treat this
- [x] How can we identify that it's an EVNT->EVNT job from the log?
- [x] Do we need to modify the Gen_tf command?
- [x] Test with `700267`S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/7Investigate usage of caches to speed up pipelines2021-06-17T13:05:27+02:00Spyros ArgyropoulosInvestigate usage of caches to speed up pipelinesSee here: https://docs.gitlab.com/ee/ci/caching/
Something to be investigated, e.g. with the `check_logParser` or `check_links` jobs.
This probably requires building a big image that contains all of the other images used in all CI j...See here: https://docs.gitlab.com/ee/ci/caching/
Something to be investigated, e.g. with the `check_logParser` or `check_links` jobs.
This probably requires building a big image that contains all of the other images used in all CI jobs, so not trivial.Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/65Running pipelines in custom swarm runner2021-06-17T13:04:50+02:00Spyros ArgyropoulosRunning pipelines in custom swarm runnerInstructions from Lukas here: https://clouddocs.web.cern.ch/containers/tutorials/swarmgitlab.html
Idea would be that if we set this up we could run with the full number of events exactly as in production.
The instructions below work. ...Instructions from Lukas here: https://clouddocs.web.cern.ch/containers/tutorials/swarmgitlab.html
Idea would be that if we set this up we could run with the full number of events exactly as in production.
The instructions below work. Wo have to understand whether this is what we want:
* [ ] does it have access to cvmfs? If not how would we set it up so that it has?
* [ ] does it buy us anything from using the shared runners?
* [ ] how tough would the maintenance be?
* [ ] is it better to just set up a dedicated machine? Maybe we should ask someone from the CERN IT to do it?Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/132Harmonisation of printouts in tranform and generator interfaces2021-06-17T13:03:33+02:00Spyros ArgyropoulosHarmonisation of printouts in tranform and generator interfacesMost of the printouts from the transform have the following format:
```
IDENTIFIER KEYWORD = VALUE
```
e.g.
```
08:29:01 Py:Gen_tf INFO .transform = Gen_tf
```
however this **very often not the case**. A f...Most of the printouts from the transform have the following format:
```
IDENTIFIER KEYWORD = VALUE
```
e.g.
```
08:29:01 Py:Gen_tf INFO .transform = Gen_tf
```
however this **very often not the case**. A few examples:
```
08:29:01 Py:Gen_tf INFO nEventsPerJob set to 2000
08:29:01 Py:Gen_tf INFO Requested output events 100
08:29:01 Py:Gen_tf WARNING Could not find evgenkeywords.txt file EvgenJobTransforms/evgenkeywords.txt in $JOBOPTSEARCHPATH
05:14:02 Nb of events : 20000
```
This means that new checks that would otherwise be trivial to implement require changes in several places (e.g. !863) and the introduction of logic which is "hacky".
We should make sure that new printouts always conform to the correct format `IDENTIFIER KEYWORD = VALUE` both in the **transform** but also in the **generator interfaces** and the above line should be **printed only once in log.generate**
I am not sure what is the best approach here. Perhaps put this in place as a "coding rule" and make everyone aware of this. (Strict checks would probably be more time-consuming to implement than just putting in place coding practices)
@ewelina I just opened this so that we somehow bring it up with the generator experts to make things easier in the future. You probably know best how to address this and maybe can discuss this in a GIT meeting.Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/135Sanity check for EVNT-to-EVNT transforms2021-06-17T11:07:17+02:00Christian GutschowSanity check for EVNT-to-EVNT transformsHi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the ...Hi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the event if it passes some Athena filter, hence most of the logic being protected by the `if runArgs.trfSubstepName == 'afterburn':` statement.
Now, because it copies the original EVNT, the new EVNT would have the MC channel number (or run number in the HepMC GenEvent) set to the original DSID and not the new DSID (of the E2E transform JO).
This can now be patched using the `postSeq.CountHepMC.CorrectRunNumber = True` flag seen at the bottom. Could we use the CI to catch cases where such a JO is being added, but that tag is missing from the JO?
(In principle, there is a printout in the `log.afterburn` produced by an E2E transform which one could grep for, but the CI doesn't handle jobs without input EVNT files yet.)
Thoughts/ideas?S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/143Follow-up from ATLMCPROD-93322021-06-07T14:15:29+02:00Spyros ArgyropoulosFollow-up from ATLMCPROD-9332The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might b...The following discussion from !1198 should be addressed:
- [ ] @cgutscho started a [discussion](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/1198#note_4542318): (+5 comments)
> Hi @sargyrop - this might be a long shot, but do you think this is something we can catch in the CI? e.g. if there's a variable in MadGraph JOs that has `gridpack` or `grid_pack` in the name and it's still set to `True`, we put out a warning ... ?
>
> Cheers,
> Chris
Make logParser fail if the info in Chris's message below appearsS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/142Follow-up from "LO EFT samples for 4top"2021-05-18T11:32:21+02:00Spyros ArgyropoulosFollow-up from "LO EFT samples for 4top"The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 ...The following discussions from !1161 should be addressed:
> This shouldn't be part of a JobOption. The first part was fixed properly in 21.6.60 and the second part is obviously gonna cause problems. `ATHENA_PROC_NUMBER` is set to 8 because the machine has 8 cores, it shouldn't be set to 80 in the JOs.
Should we add the following checks/changes:
- if ATHENA_PROC_NUMBER > 1 and release < 21.2.60 => ERROR
- if ATHENA_PROC_NUMBER > 1 => run only 1 event in CI
- change the way we check whether the jO changes ATHENA_PROC_NUMBER - this would only be safe to catch in the transform btw, but until it is implemented there we could change the check to not use anywhere ATHENA_PROC_NUMBER (not even printing it), so e.g. look in the jO and if there is an uncommented line with "ATHENA_PROC_NUMBER" in it then give error
@cgutschoS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/141check_unique_controlFile.sh fails when it shouldn't?2021-05-13T09:00:06+02:00Jeff Shahiniancheck_unique_controlFile.sh fails when it shouldn't?[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, whe...[check_unique_controlFile.sh](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh) is apparently a new part of the CI. I noticed that it fails even when given symlinks. For example, when uploading JOs (with symlinks to one control file) that look like this:
```
$ ls -a *
100001:
myJO_1.py
myControlFile.py
100002:
myJO_2.py
myControlFile.py -> ../100001/myControlFile.py
```
The CI job fails and recommends that you use symlinks (even if you already are):
```
ERROR: Duplicate file(s) found:
./100xxx/100001/myControlFile.py
If the files have exactly the same content, please only keep one physical file replacing the rest with symbolic links.
If the files have differences consider renaming the files that you added.
You can check for differences with diff -w file1 file2
```
Perhaps we need to add ```-type f``` to [this line](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/check_unique_controlFile.sh#L23) as well?
Here's an example of a failing CI job:
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/13818890
Tagging @sargyrop
Best,
JeffS1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/57logParser gives missing weights for LHE-only geneation2021-05-08T12:30:45+02:00Spyros ArgyropouloslogParser gives missing weights for LHE-only geneationReported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 ...Reported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 with 21.6.12,AthGeneration) [1] and looked at the weight variations using truth derivation. But using logParser.py to check log.generate, it says “weights missing”.
Probably what happens is that the weights in logParser are read from lines that look like this:
09:33:22 MetaData: weights = MUR=0.5 MUF=0.5 | PDF=260000 MemberID=1
and for LHE-only generation this does not get written out.
Need to confirm and perhaps fix for LHE only generationS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/140Strange behaviour of commit script when athena is skipped and run time is > 1h2021-05-06T11:03:19+02:00Spyros ArgyropoulosStrange behaviour of commit script when athena is skipped and run time is > 1hFor example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athen...For example in a case with 1 event, CPU=3.09h, the output is the following:
![Screenshot_2021-05-06_at_09.46.11](/uploads/63cb47b36de5f75bb8e9e6a275602971/Screenshot_2021-05-06_at_09.46.11.png)
which is correct, but when skipping athena:
![Screenshot_2021-05-06_at_09.45.42](/uploads/c01022a7990d31535fb7cca7aa2e6a4c/Screenshot_2021-05-06_at_09.45.42.png)
the
```
printGood -f "\tOK: CI job time estimate: $cpu hours, but athena will not run in the CI"
```
message is not printed because the script never reaches that point.S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/64Allow CI to run with external LHE inputs2021-04-27T12:14:25+02:00Spyros ArgyropoulosAllow CI to run with external LHE inputsThe CI can currently not yet run jobs with input event files. Largely the same tasks appear for both LHE->EVNT and EVNT->EVNT files:
* [x] Upload mcgensvc grid certificate in CI and make sure `voms-proxy-init -voms atlas` works in the CI...The CI can currently not yet run jobs with input event files. Largely the same tasks appear for both LHE->EVNT and EVNT->EVNT files:
* [x] Upload mcgensvc grid certificate in CI and make sure `voms-proxy-init -voms atlas` works in the CI
* [x] Write (rucio) file that was used in local testing into `log.generate.short`, e.g. `inputGeneratorFile=TXT.<dsid>.tar.gz` for LHE or `inputEVNTFile=1231231.EVNT.pool.root` for EVNT input.
* [x] If input file is specified in `log.generate.short`, the CI `run_athena.sh` job should `rucio get` that file and add the corresponding arguments to the `Gen_tf.py` command line as described in [Twiki](https://twiki.cern.ch/twiki/bin/view/AtlasProtected/SpecialConfigurations#Using_event_input_LHE_EVNT_or_EV).
* [x] Special treatment for EVNT->EVNT jobs: Apparently such jobs do not produce a `log.generate` file but a `log.afterburn`. Disable logParser for these, or special treatment? (move to #139)
* [ ] Ideally, the CI should also check the `inputFilesPerJob` in the JO for reasonable values at this stage: it should not exceed `10GB/sizeof(downloadedTestFile)` to make sure they fit into a grid node.
* [x] Similar case that came up in #89 is running athena with `--inputGeneratorFile`
Example !203
LHE-only log: `~sargyrop/public/log.generate_LHE`S1.2021Spyros ArgyropoulosSpyros Argyropoulos