MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2024-02-01T14:46:57+01:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/225logParser failed due to missing platform info2024-02-01T14:46:57+01:00Yang LiulogParser failed due to missing platform infoHi @sargyrop , as we discussed in [Fixing automatic determination of release for CI runs (!2861) · Merge requests · atlas-physics / pmg / MC Job Options · GitLab (cern.ch)](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_re...Hi @sargyrop , as we discussed in [Fixing automatic determination of release for CI runs (!2861) · Merge requests · atlas-physics / pmg / MC Job Options · GitLab (cern.ch)](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/merge_requests/2861). It seems the added line to extract the platform info will cause problem for some of the jobs.
[Here](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/6821117) is one example.
Many thanks for your time to help.
Cheers
Yanghttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/109logParser fails in CI when run on MadGraph due to nevents check2020-05-17T14:24:09+02:00Spyros ArgyropouloslogParser fails in CI when run on MadGraph due to nevents checkAs seen in !412 when running a jO with:
```
evgenConfig.nEventsPerJob = 10000
nevents = runArgs.maxEvents1.2 if runArgs.maxEvents>0 else 1.1evgenConfig.nEventsPerJob
```
`logParser` fails with
```
ERROR: Increase nevents to be gener...As seen in !412 when running a jO with:
```
evgenConfig.nEventsPerJob = 10000
nevents = runArgs.maxEvents1.2 if runArgs.maxEvents>0 else 1.1evgenConfig.nEventsPerJob
```
`logParser` fails with
```
ERROR: Increase nevents to be generated in MG from 120 to 11000
```S1.2020Spyros ArgyropoulosSpyros Argyropoulos2020-05-16https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/131LogParser fails to pickup nevents keyword2020-12-09T18:06:20+01:00Christian GutschowLogParser fails to pickup nevents keywordFrom @avroy:
```
When trying to check some new JOs that were generated using 21.6.54 and cc7, the logPArser failed with the following error
Traceback (most recent call last):
File "scripts/logParser.py", line 296, in madgraphChecks
...From @avroy:
```
When trying to check some new JOs that were generated using 21.6.54 and cc7, the logPArser failed with the following error
Traceback (most recent call last):
File "scripts/logParser.py", line 296, in madgraphChecks
neventsMG=int(float(generatorDict['"nevents"'][0]))
IndexError: list index out of range
I think the error is associated with the fact that in the new log file, the keyword is logged as nevents (i.e. without the quotes). You can find the log file in the uploaded zipball in https://its.cern.ch/jira/browse/ATLMCPROD-8926
Please look at JOs/200xxx/200001/log.generate
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/57logParser gives missing weights for LHE-only geneation2021-05-08T12:30:45+02:00Spyros ArgyropouloslogParser gives missing weights for LHE-only geneationReported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 ...Reported by @xiaohu
> Just to be 100% sure, we generate LHE files externally using Powheg-Box-V2. Weight variations due to PDF and scales are prepared. With checkMetaSG.py, we do see all the weights in evgen files (showered by Herwig7 with 21.6.12,AthGeneration) [1] and looked at the weight variations using truth derivation. But using logParser.py to check log.generate, it says “weights missing”.
Probably what happens is that the weights in logParser are read from lines that look like this:
09:33:22 MetaData: weights = MUR=0.5 MUF=0.5 | PDF=260000 MemberID=1
and for LHE-only generation this does not get written out.
Need to confirm and perhaps fix for LHE only generationS1.2021https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/123logParser picks out wrong COM energy2020-09-21T16:08:05+02:00Christian GutschowlogParser picks out wrong COM energySee e.g. !676 where it extracted `ecmEnergy = 13000` even though the `log.generate` was for 8 TeV:
```
/afs/cern.ch/user/c/cgutscho/public/forSpyros/log.generate
```
Why though?See e.g. !676 where it extracted `ecmEnergy = 13000` even though the `log.generate` was for 8 TeV:
```
/afs/cern.ch/user/c/cgutscho/public/forSpyros/log.generate
```
Why though?S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/157logParser.py didn't grep inputGeneratorFile2022-03-02T09:41:23+01:00Yiming AbulaitilogParser.py didn't grep inputGeneratorFileHi
---- This is for LHE files as inputs -----
Bellow is the content of log.generate.short, this log file is generated with 21.6.75
---------
- estimated CPU for CI job = 0.00 hrs
- using release = AthGeneration-21.6.75
- ecmEnergy = ...Hi
---- This is for LHE files as inputs -----
Bellow is the content of log.generate.short, this log file is generated with 21.6.75
---------
- estimated CPU for CI job = 0.00 hrs
- using release = AthGeneration-21.6.75
- ecmEnergy = 13000.0
- inputGeneratorFile = 09:20:14 Py:Gen_tf INFO inputGeneratorFile = TXT.440365._000001.tar.gz
- randomSeed = 1234
- EVNT to EVNT = False
- LHEonly = False
---------------
The inputGeneratorFile field is messed up here.
But the logParser.py works fine with old releases like 21.6.56. The reason is that athena print out changed in new release (this is what I observed):
Print out from 21.6.56:
--> 16:06:18 Py:Gen_tf INFO inputGeneratorFile used TXT.440329._000001.tar.gz
Print out from 21.6.75:
--> 09:09:41 Py:Gen_tf INFO inputGeneratorFile = TXT.440363._000001.tar.gz
You can see the changed from "used" to "=".
The line [L159](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/logParser.py#L159) has to check both "used" and "=" to accommodate changes made in athena.
Cheers,
AbletSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/121logParser rejects logs with nEventsPerJob > 10k2020-08-28T16:51:47+02:00Christian GutschowlogParser rejects logs with nEventsPerJob > 10kFollowing the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable n...Following the successful test in ATLMCPROD-8659, we should allow cases where `nEventsPerJob` is a multiple of 10k.
Currently it fails saying
```
- CountHepMC Events passing all checks and written = 20000 <-- ERROR: Not an acceptable number of events for production (1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000)
```S2.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/219logParser run failed.2023-10-30T17:39:36+01:00Yiming AbulaitilogParser run failed.logParser run failed due to the unprotected parameter "nEventsPerJob_fromJO". See the error message bellow
'''
- Number of input LHE events: 65000
Traceback (most recent call last):
File "./scripts/logParser.py", line 782, in <module>...logParser run failed due to the unprotected parameter "nEventsPerJob_fromJO". See the error message bellow
'''
- Number of input LHE events: 65000
Traceback (most recent call last):
File "./scripts/logParser.py", line 782, in <module>
main()
File "./scripts/logParser.py", line 683, in main
if expected_EVNT_out > 2 * nEventsPerJob_fromJO:
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
'''
In line https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/scripts/logParser.py?ref_type=heads#L683
The variable "nEventsPerJob_fromJO" is used but it can be None type when the neventsPerjob is not specified in JO file.
You could just you "nEventsPerJob" variable since it is already overwritten by "nEventsPerJob_fromJO" or set to 10000 if "nEventsPerJob_fromJO" is None.
see line: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blame/master/scripts/logParser.py#L518
For test, you can download a log.generate file here: https://cernbox.cern.ch/s/U86AjY5bTjTACwy
Cheers,
AbletSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/39logParser switch from minEvents to nEventsPerJob2019-10-25T12:20:28+02:00Spyros ArgyropouloslogParser switch from minEvents to nEventsPerJobFrom ewelina:
> starting from 21.6.12 nEventsPerJob should be obligatory (transform recognizes minevents i.e. does not crash when the parameter appears, but assignes no value or action to it).
Todo: give ERROR if minEvents is usedFrom ewelina:
> starting from 21.6.12 nEventsPerJob should be obligatory (transform recognizes minevents i.e. does not crash when the parameter appears, but assignes no value or action to it).
Todo: give ERROR if minEvents is usedAlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/82logParser throw error if release used is not AthGeneration2020-03-02T08:58:06+01:00Spyros ArgyropouloslogParser throw error if release used is not AthGenerationRecently a jO with `AthGenerationExternals` was used and this was not caught in logParser or any other test.
Should add the test and throw an error.Recently a jO with `AthGenerationExternals` was used and this was not caught in logParser or any other test.
Should add the test and throw an error.S1.2020Ewelina Maria LobodzinskaEwelina Maria Lobodzinskahttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/168logParser unsupported locale setting2022-04-21T20:53:54+02:00Spyros ArgyropouloslogParser unsupported locale settingFull error message:
INFO: New DSID directory: 100xxx/100001 ...
OK: log.generate file found.
Traceback (most recent call last):
File "scripts/logParser.py", line 8, in <module>
locale.setlocale(locale.LC_CTYPE, f'{lang}.UTF-8'...Full error message:
INFO: New DSID directory: 100xxx/100001 ...
OK: log.generate file found.
Traceback (most recent call last):
File "scripts/logParser.py", line 8, in <module>
locale.setlocale(locale.LC_CTYPE, f'{lang}.UTF-8')
File "/usr/lib64/python3.6/locale.py", line 598, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
ERROR: logParser run failed.
Need output of
- locale
- locale -a
- env
- which machine you are running on
@yanlin @nishuSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/101Long compilation time when running MadGraph in atlas/slc6-atlasos causing CI...2021-04-22T16:46:58+02:00Jason Robert VeatchLong compilation time when running MadGraph in atlas/slc6-atlasos causing CI timeouts## How to reproduce the problem
```
# Mount cvmfs
sudo mkdir -p /cvmfs/atlas.cern.ch
sudo mkdir -p /cvmfs/atlas-condb.cern.ch
sudo mkdir -p /cvmfs/grid.cern.ch
sudo mkdir -p /cvmfs/sft.cern.ch
sudo mount -t cvmfs atlas.cern.ch /cvmfs/at...## How to reproduce the problem
```
# Mount cvmfs
sudo mkdir -p /cvmfs/atlas.cern.ch
sudo mkdir -p /cvmfs/atlas-condb.cern.ch
sudo mkdir -p /cvmfs/grid.cern.ch
sudo mkdir -p /cvmfs/sft.cern.ch
sudo mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch
sudo mount -t cvmfs atlas-condb.cern.ch /cvmfs/atlas-condb.cern.ch
sudo mount -t cvmfs grid.cern.ch /cvmfs/grid.cern.ch
sudo mount -t cvmfs sft.cern.ch /cvmfs/sft.cern.ch
# Get the docker image
docker pull atlas/slc6-atlasos
# Run image in a container and mount cvmfs
docker run -it -v /cvmfs:/cvmfs b4cfa1203c45
# Inside the docker container get the mcjoboptions repo (or alternatively you can copy it from your local area with docker cp)
kinit USER@CERN.CH
git clone https://:@gitlab.cern.ch:8443/atlas-physics/pmg/mcjoboptions.git
cd mcjoboptions
git checkout dsid_jveatch_500538
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
./scripts/run_athena.sh
```
## Debugging
#### Bottleneck: compilation time
Comparing the running times at several execution points on lxplus and in the container it seems that the problem lies on the compilation times:
```
Docker (running on a dual-core laptop with cvmfs mounted via fuse):
generate 19:24:54 INFO: Using LHAPDF v6.2.3 interface for PDFs
generate 19:26:19 INFO: Compiling source…
generate 19:31:53 INFO: ...done, continuing with P* directories => 334 sec
generate 19:31:53 INFO: Compiling StdHEP (can take a couple of minutes) ...
generate 19:45:23 INFO: …done. => 810 sec
generate 19:45:24 INFO: Compiling on 1 cores
generate 19:45:24 INFO: Compiling P0_gg_ttx...
generate 19:54:37 INFO: P0_gg_ttx done. => 553 sec
vs lxplus (interactive run)
10:15:08 INFO: Using LHAPDF v6.2.3 interface for PDFs
10:15:14 INFO: Compiling source...
10:15:26 INFO: ...done, continuing with P* directories => 12 sec
10:15:26 INFO: Compiling StdHEP (can take a couple of minutes) ...
10:16:04 INFO: …done. => 38 sec
10:16:05 INFO: Compiling on 1 cores
10:16:05 INFO: Compiling P0_gg_ttx...
10:16:45 INFO: P0_gg_ttx done. => 40 sec
```
#### Size/memory
The container available space is 53GB and where the compilation becomes slow the size of the container is ~230 MB so much smaller => **disk size does not seem to be causing the slowdown**
The available memory was changed from 1GB to 8GB without any effect on the compilation time in the container.
#### Reading from cvmfs
I run a script that 1) reads all the lines from a file that lives on cvmfs and 2) copies this script to a local directory and remove it.
The local run on my laptop (with cvmfs mounted with fuse gives this):
```
Reading 500 times
real 0m21.504s
user 0m12.937s
sys 0m8.429s
Copying 500 times
real 0m4.993s
user 0m0.620s
sys 0m2.440s
```
Running the script from the container, where the locally available cvmfs directory (see above) is mounted to the container as a volume, gives this:
```
Reading 500 times
real 1m44.217s
user 0m18.329s
sys 0m20.376s
Copying 500 times
real 0m3.716s
user 0m0.570s
sys 0m0.981s
```
**So reading a file seems to be 5x slower when running from the docker container**
#### Next steps
* [ ] To debug further we would need to know exactly how cvmfs is mounted in the gitlab runner
* [ ] Also need to check whether there is any correlation between slow reading times on cvmfs and MG - does MG call the compilers from cvmfs/reads any other info from cvmfs? Probably
---
Original report from Jason - similar issues observed with other processes which are apparently very different than this one (an NLO one and a LO one with a long decay chain)
Job [#7937441](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/7937441) failed for 9a6a4445a5bcf7ae08ac81888cccd79ef4cc4af3:
Dear experts,
The run_athena job for my branch times out. I have been trying to debug this from my side, but I am at a loss about how to proceed. The estimated execution time from each log.generate.short is ~0.1 hours, so I wouldn't expect this to be an issue. Could you please advise?
Thanks in advance,
JasonFutureSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/34Lower limit on amount of run_athena jobs that should be run2022-02-18T15:50:50+01:00Spyros ArgyropoulosLower limit on amount of run_athena jobs that should be runAs shown in [this pipeline](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/pipelines/1093023) we can completely miss cases where `log.generate` files are not provided.
Frank suggested the following check that can be implemented i...As shown in [this pipeline](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/pipelines/1093023) we can completely miss cases where `log.generate` files are not provided.
Frank suggested the following check that can be implemented in e.g. the `run_athena` job:
```
if Nlogfiles < roundup(0.01*N_newDSIDs)
return 1
```Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/5Make CI jobs corresponding to `[jo ci]` run by default?2019-05-18T14:34:41+02:00Spyros ArgyropoulosMake CI jobs corresponding to `[jo ci]` run by default?We should collect some statistics of the use cases for the pipeline jobs.
If the majority run the `[jo ci]` pipeline we could make that the default (no specific commit message would be required) and keep `[skip ci]` and `[dev ci]` for ...We should collect some statistics of the use cases for the pipeline jobs.
If the majority run the `[jo ci]` pipeline we could make that the default (no specific commit message would be required) and keep `[skip ci]` and `[dev ci]` for extra control.
The steering of the jobs could be performed with `only`, `refs`, `changes` and `except` keywords.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/150Make CI job that sends email to conveners2021-09-30T13:43:46+02:00Spyros ArgyropoulosMake CI job that sends email to conveners- when commit message contains [skip modfiles]
- also when files are actually modified ? (we probably want this as well - some people add skip modfiles when there's no reason to)- when commit message contains [skip modfiles]
- also when files are actually modified ? (we probably want this as well - some people add skip modfiles when there's no reason to)Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/100Make -m obligatory in commit script2020-04-28T18:32:35+02:00Spyros ArgyropoulosMake -m obligatory in commit script* [x] Remove current parsing logic
* [x] Check that skipping athena,logParser works as before* [x] Remove current parsing logic
* [x] Check that skipping athena,logParser works as beforeS1.2020Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/23Make run_athena send jobs to lxbatch2019-09-23T08:27:03+02:00Spyros ArgyropoulosMake run_athena send jobs to lxbatch#17 #12#17 #12https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/210Make some jo checks stricter?2024-02-18T07:23:36+01:00Spyros ArgyropoulosMake some jo checks stricter?```
runArgs.inputGeneratorFile = outputDS.replace('tar.gz', 'events')
```
passes the check since when running the jO outside the transform it leads to an undefined object.
Maybe we need another way to avoid such issues.```
runArgs.inputGeneratorFile = outputDS.replace('tar.gz', 'events')
```
passes the check since when running the jO outside the transform it leads to an undefined object.
Maybe we need another way to avoid such issues.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/119Mentioning ATLMCPROD ticket in MR doesn't push link to Jira any longer2020-07-31T10:47:09+02:00Christian GutschowMentioning ATLMCPROD ticket in MR doesn't push link to Jira any longer... not sure there's much we can do about this though?
Any ideas anyone?... not sure there's much we can do about this though?
Any ideas anyone?https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/25Modification check should also veto new files in existing DSID directories2019-09-07T11:55:18+02:00Frank SiegertModification check should also veto new files in existing DSID directoriesAs far as I understand, we currently only check for *modified* files in
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/check_modified_files.sh
But this should be extended to reject *new* files in existing DSI...As far as I understand, we currently only check for *modified* files in
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/check_modified_files.sh
But this should be extended to reject *new* files in existing DSID directories, since otherwise we would allow to add a common base fragment like `Pythia8_i/Pythia8_..._Common.py` to an existing directory and thus overwrite the one that was used from the release (thus changing the physics output).Mukesh KumarSpyros ArgyropoulosMukesh Kumar