MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2021-06-17T11:07:17+02:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/135Sanity check for EVNT-to-EVNT transforms2021-06-17T11:07:17+02:00Christian GutschowSanity check for EVNT-to-EVNT transformsHi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the ...Hi,
here's an [example JO](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/950xxx/950096/mc.Sh_2210_Zee_E2Etransform_valid.py) for an EVNT-to-EVNT transform.
This basically clones an input EVNT, but only copies the event if it passes some Athena filter, hence most of the logic being protected by the `if runArgs.trfSubstepName == 'afterburn':` statement.
Now, because it copies the original EVNT, the new EVNT would have the MC channel number (or run number in the HepMC GenEvent) set to the original DSID and not the new DSID (of the E2E transform JO).
This can now be patched using the `postSeq.CountHepMC.CorrectRunNumber = True` flag seen at the bottom. Could we use the CI to catch cases where such a JO is being added, but that tag is missing from the JO?
(In principle, there is a printout in the `log.afterburn` produced by an E2E transform which one could grep for, but the CI doesn't handle jobs without input EVNT files yet.)
Thoughts/ideas?S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/12Dynamical job creation for multiple DSID commits2021-01-10T17:33:29+01:00Spyros ArgyropoulosDynamical job creation for multiple DSID commitsWhen we run athena the job will check if the expected time is > 1h and will abort if it is.
If several DSIDs are added simultaneously (e.g. 10-30 DSIDs for SUSY, EXO signal grids) the `run_athena` job will run for a very long time.
We ...When we run athena the job will check if the expected time is > 1h and will abort if it is.
If several DSIDs are added simultaneously (e.g. 10-30 DSIDs for SUSY, EXO signal grids) the `run_athena` job will run for a very long time.
We should find a workaround for that. Could be handled with the `commit_new_dsid.sh` script making several branches.
Some interesting material here: https://gitlab.com/gitlab-org/gitlab-ce/issues/45828
In particular there's a feature request for dynamic CI jobs: https://gitlab.com/gitlab-org/gitlab-ce/issues/44199 probably to be implemented in early 2020.
**Update**: this seems to be exactly what we need: https://gitlab.com/gitlab-org/gitlab/issues/35632S1.2021Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/101Long compilation time when running MadGraph in atlas/slc6-atlasos causing CI...2021-04-22T16:46:58+02:00Jason Robert VeatchLong compilation time when running MadGraph in atlas/slc6-atlasos causing CI timeouts## How to reproduce the problem
```
# Mount cvmfs
sudo mkdir -p /cvmfs/atlas.cern.ch
sudo mkdir -p /cvmfs/atlas-condb.cern.ch
sudo mkdir -p /cvmfs/grid.cern.ch
sudo mkdir -p /cvmfs/sft.cern.ch
sudo mount -t cvmfs atlas.cern.ch /cvmfs/at...## How to reproduce the problem
```
# Mount cvmfs
sudo mkdir -p /cvmfs/atlas.cern.ch
sudo mkdir -p /cvmfs/atlas-condb.cern.ch
sudo mkdir -p /cvmfs/grid.cern.ch
sudo mkdir -p /cvmfs/sft.cern.ch
sudo mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch
sudo mount -t cvmfs atlas-condb.cern.ch /cvmfs/atlas-condb.cern.ch
sudo mount -t cvmfs grid.cern.ch /cvmfs/grid.cern.ch
sudo mount -t cvmfs sft.cern.ch /cvmfs/sft.cern.ch
# Get the docker image
docker pull atlas/slc6-atlasos
# Run image in a container and mount cvmfs
docker run -it -v /cvmfs:/cvmfs b4cfa1203c45
# Inside the docker container get the mcjoboptions repo (or alternatively you can copy it from your local area with docker cp)
kinit USER@CERN.CH
git clone https://:@gitlab.cern.ch:8443/atlas-physics/pmg/mcjoboptions.git
cd mcjoboptions
git checkout dsid_jveatch_500538
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
./scripts/run_athena.sh
```
## Debugging
#### Bottleneck: compilation time
Comparing the running times at several execution points on lxplus and in the container it seems that the problem lies on the compilation times:
```
Docker (running on a dual-core laptop with cvmfs mounted via fuse):
generate 19:24:54 INFO: Using LHAPDF v6.2.3 interface for PDFs
generate 19:26:19 INFO: Compiling source…
generate 19:31:53 INFO: ...done, continuing with P* directories => 334 sec
generate 19:31:53 INFO: Compiling StdHEP (can take a couple of minutes) ...
generate 19:45:23 INFO: …done. => 810 sec
generate 19:45:24 INFO: Compiling on 1 cores
generate 19:45:24 INFO: Compiling P0_gg_ttx...
generate 19:54:37 INFO: P0_gg_ttx done. => 553 sec
vs lxplus (interactive run)
10:15:08 INFO: Using LHAPDF v6.2.3 interface for PDFs
10:15:14 INFO: Compiling source...
10:15:26 INFO: ...done, continuing with P* directories => 12 sec
10:15:26 INFO: Compiling StdHEP (can take a couple of minutes) ...
10:16:04 INFO: …done. => 38 sec
10:16:05 INFO: Compiling on 1 cores
10:16:05 INFO: Compiling P0_gg_ttx...
10:16:45 INFO: P0_gg_ttx done. => 40 sec
```
#### Size/memory
The container available space is 53GB and where the compilation becomes slow the size of the container is ~230 MB so much smaller => **disk size does not seem to be causing the slowdown**
The available memory was changed from 1GB to 8GB without any effect on the compilation time in the container.
#### Reading from cvmfs
I run a script that 1) reads all the lines from a file that lives on cvmfs and 2) copies this script to a local directory and remove it.
The local run on my laptop (with cvmfs mounted with fuse gives this):
```
Reading 500 times
real 0m21.504s
user 0m12.937s
sys 0m8.429s
Copying 500 times
real 0m4.993s
user 0m0.620s
sys 0m2.440s
```
Running the script from the container, where the locally available cvmfs directory (see above) is mounted to the container as a volume, gives this:
```
Reading 500 times
real 1m44.217s
user 0m18.329s
sys 0m20.376s
Copying 500 times
real 0m3.716s
user 0m0.570s
sys 0m0.981s
```
**So reading a file seems to be 5x slower when running from the docker container**
#### Next steps
* [ ] To debug further we would need to know exactly how cvmfs is mounted in the gitlab runner
* [ ] Also need to check whether there is any correlation between slow reading times on cvmfs and MG - does MG call the compilers from cvmfs/reads any other info from cvmfs? Probably
---
Original report from Jason - similar issues observed with other processes which are apparently very different than this one (an NLO one and a LO one with a long decay chain)
Job [#7937441](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/7937441) failed for 9a6a4445a5bcf7ae08ac81888cccd79ef4cc4af3:
Dear experts,
The run_athena job for my branch times out. I have been trying to debug this from my side, but I am at a loss about how to proceed. The estimated execution time from each log.generate.short is ~0.1 hours, so I wouldn't expect this to be an issue. Could you please advise?
Thanks in advance,
JasonFutureSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/65Running pipelines in custom swarm runner2021-06-17T13:04:50+02:00Spyros ArgyropoulosRunning pipelines in custom swarm runnerInstructions from Lukas here: https://clouddocs.web.cern.ch/containers/tutorials/swarmgitlab.html
Idea would be that if we set this up we could run with the full number of events exactly as in production.
The instructions below work. ...Instructions from Lukas here: https://clouddocs.web.cern.ch/containers/tutorials/swarmgitlab.html
Idea would be that if we set this up we could run with the full number of events exactly as in production.
The instructions below work. Wo have to understand whether this is what we want:
* [ ] does it have access to cvmfs? If not how would we set it up so that it has?
* [ ] does it buy us anything from using the shared runners?
* [ ] how tough would the maintenance be?
* [ ] is it better to just set up a dedicated machine? Maybe we should ask someone from the CERN IT to do it?Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/48Dynamically setting allowed to fail jobs2022-05-17T11:05:37+02:00Spyros ArgyropoulosDynamically setting allowed to fail jobsExpected to come with gitlab 12.6: https://gitlab.com/gitlab-org/gitlab/issues/16733
An example usage in our case would be that CI jobs which don't find anything to check dynamically set an "failed but allowed to fail" status -> shows ...Expected to come with gitlab 12.6: https://gitlab.com/gitlab-org/gitlab/issues/16733
An example usage in our case would be that CI jobs which don't find anything to check dynamically set an "failed but allowed to fail" status -> shows up as a WARNING in the pipeline, so that people who approve MRs know that they have to investigate further.FutureSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/34Lower limit on amount of run_athena jobs that should be run2022-02-18T15:50:50+01:00Spyros ArgyropoulosLower limit on amount of run_athena jobs that should be runAs shown in [this pipeline](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/pipelines/1093023) we can completely miss cases where `log.generate` files are not provided.
Frank suggested the following check that can be implemented i...As shown in [this pipeline](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/pipelines/1093023) we can completely miss cases where `log.generate` files are not provided.
Frank suggested the following check that can be implemented in e.g. the `run_athena` job:
```
if Nlogfiles < roundup(0.01*N_newDSIDs)
return 1
```Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/7Investigate usage of caches to speed up pipelines2021-06-17T13:05:27+02:00Spyros ArgyropoulosInvestigate usage of caches to speed up pipelinesSee here: https://docs.gitlab.com/ee/ci/caching/
Something to be investigated, e.g. with the `check_logParser` or `check_links` jobs.
This probably requires building a big image that contains all of the other images used in all CI j...See here: https://docs.gitlab.com/ee/ci/caching/
Something to be investigated, e.g. with the `check_logParser` or `check_links` jobs.
This probably requires building a big image that contains all of the other images used in all CI jobs, so not trivial.Futurehttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/194Notify changes assigns MR to wrong person2022-11-20T17:08:34+01:00Spyros ArgyropoulosNotify changes assigns MR to wrong personneed to assign to convenersneed to assign to convenersSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/193Turn off branch pipelines2023-01-03T08:22:29+01:00Spyros ArgyropoulosTurn off branch pipelinesIf someone opens a MR when a branch pipeline is still running 2 concurrent pipelines are created.
One will fail since the one that finishes first will push to the branch and then the last CI job will try to push to a branch that is beh...If someone opens a MR when a branch pipeline is still running 2 concurrent pipelines are created.
One will fail since the one that finishes first will push to the branch and then the last CI job will try to push to a branch that is behind.
![Screenshot_2022-11-17_at_16.36.11](/uploads/afe31bf16cbf3496ccb1bf6f2703d7ac/Screenshot_2022-11-17_at_16.36.11.png)
We should turn off all branch pipelines.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/192setup_athena pipeline failing2022-11-17T10:18:51+01:00Spyros Argyropoulossetup_athena pipeline failingSee !2152
The branch pipeline succeeds: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/4772445
Merge request pipeline succeeds when `log.generate.short` is present: https://gitlab.cern.ch/atlas-physics/pmg/mcjobopti...See !2152
The branch pipeline succeeds: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/pipelines/4772445
Merge request pipeline succeeds when `log.generate.short` is present: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/25883448
Merge request pipeline fails when `log.generate.short` is not present: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/25883547
Branch pipeline also fails when `log.generate.short` is not present: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/25883631Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/190Remove possibility to skip all pipelines2022-10-08T10:37:24+02:00Spyros ArgyropoulosRemove possibility to skip all pipelinesWe should remove the `[skip all]` option since it is abused with no reason.
Need to think how to redesign the pipeline to make this happen.We should remove the `[skip all]` option since it is abused with no reason.
Need to think how to redesign the pipeline to make this happen.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/180Bug in check for non-reachable files2022-06-29T10:45:47+02:00Spyros ArgyropoulosBug in check for non-reachable filesATLMCPROD-10024
```
grep include mc.Ph_PDF4LHC21_WpH125J_Wincl_MINLO_LHE.py
include("PowhegControl/PowhegControl_HWj_Common.py")
include("Pythia8_i/Pythia8_A14_NNPDF23LO_EvtGen_Common.py")
include("Pythia8_i/Pythia8_Powheg_Main31.py"...ATLMCPROD-10024
```
grep include mc.Ph_PDF4LHC21_WpH125J_Wincl_MINLO_LHE.py
include("PowhegControl/PowhegControl_HWj_Common.py")
include("Pythia8_i/Pythia8_A14_NNPDF23LO_EvtGen_Common.py")
include("Pythia8_i/Pythia8_Powheg_Main31.py")
include("Pythia8_SMHiggs125_inc.py")
```
This appears as a bug because `PowhegControl` and `Pythia8_i` are known to `Gen_tf` but are obviously not present in the DSID directory. To do this properly one would actually need to run `Gen_tf` (where `Gen_tf` looks for the jO is based on what is in the cmake file which might change).
So basically all tests should be removed. Perhaps one which can stay is to check if there is any include pointing to `afs` but this only happened once in 1500 MRs, so I would prefer to completely remove this check.
@mgignac any objection?Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/167Fix logParser bug when running in CI2022-04-20T22:28:20+02:00Spyros ArgyropoulosFix logParser bug when running in CI![Screenshot_2022-04-20_at_21.08.27](/uploads/ed6760a1cbe977212c0904faf484fd7a/Screenshot_2022-04-20_at_21.08.27.png)![Screenshot_2022-04-20_at_21.08.27](/uploads/ed6760a1cbe977212c0904faf484fd7a/Screenshot_2022-04-20_at_21.08.27.png)Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/166Add checks for python2/3 compatibility of jO2022-04-21T17:21:24+02:00Spyros ArgyropoulosAdd checks for python2/3 compatibility of jOan example DSID is 830099: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/830xxx/830099/mc.H7EG_jetjet_72_Cluster_JZ1.py
R21: 21.6.85
R22: You can try 22.6.13 (later releases have issues with EvtGen_i — should be ...an example DSID is 830099: https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/blob/master/830xxx/830099/mc.H7EG_jetjet_72_Cluster_JZ1.py
R21: 21.6.85
R22: You can try 22.6.13 (later releases have issues with EvtGen_i — should be fixed soon).Spyros ArgyropoulosSpyros Argyropoulos2022-04-25https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/32Find remote name before doing a push from `commit_new_dsid.sh`2019-09-13T14:39:36+02:00Spyros ArgyropoulosFind remote name before doing a push from `commit_new_dsid.sh`Currently the script assumes that remote is named `origin`. This might not be the case.
The script should find the name of the remote and use that instead for the push.Currently the script assumes that remote is named `origin`. This might not be the case.
The script should find the name of the remote and use that instead for the push.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/28CI checks that evgenConfig.inputFilesPerJob is not too high for a grid node2019-09-14T20:35:15+02:00Spyros ArgyropoulosCI checks that evgenConfig.inputFilesPerJob is not too high for a grid nodeLimit: 100 -- but doen't that have to be different for LHE vs. EVNT input due to their file size and possibly different minevents? How to best take that into account, Misha and Dominic? Remember that we don't have access to the input dat...Limit: 100 -- but doen't that have to be different for LHE vs. EVNT input due to their file size and possibly different minevents? How to best take that into account, Misha and Dominic? Remember that we don't have access to the input dataset in the CI. Maybe for a start we use the limit of 100, which should be fine for LHE based requests. And for EVNT based requests we at least won't need more, so there are no false rejections from the CI.Spyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/25Modification check should also veto new files in existing DSID directories2019-09-07T11:55:18+02:00Frank SiegertModification check should also veto new files in existing DSID directoriesAs far as I understand, we currently only check for *modified* files in
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/check_modified_files.sh
But this should be extended to reject *new* files in existing DSI...As far as I understand, we currently only check for *modified* files in
https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/check_modified_files.sh
But this should be extended to reject *new* files in existing DSID directories, since otherwise we would allow to add a common base fragment like `Pythia8_i/Pythia8_..._Common.py` to an existing directory and thus overwrite the one that was used from the release (thus changing the physics output).Mukesh KumarSpyros ArgyropoulosMukesh Kumarhttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/18Update README2019-09-07T11:55:17+02:00Spyros ArgyropoulosUpdate READMESpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/17Short term solution for multiple DSID commits2019-09-07T11:55:17+02:00Spyros ArgyropoulosShort term solution for multiple DSID commitsUse `log.generate.short` to determine whether athena should run in the CI
* [x] Update run athena script to skip directories not containing log.generate.short
* [x] Update logParser job to skip directories not containing log.generate.s...Use `log.generate.short` to determine whether athena should run in the CI
* [x] Update run athena script to skip directories not containing log.generate.short
* [x] Update logParser job to skip directories not containing log.generate.short - print WARNING
* [x] Update automatic submission script to use a better syntax `-DSID=123456,123457,... -skipAthena=123456,123457 -h`
* [x] Update automatic submission script to skip athena running for directories not containing `log.generate`
* [x] Update instructions for people accepting MR that they make sure to look into the log of logParser jobsSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/16Update run_athena CI job to use Gen_tf2019-09-07T11:55:17+02:00Spyros ArgyropoulosUpdate run_athena CI job to use Gen_tf```
asetup 21.6.6,AthGeneration
Gen_tf.py --ecmEnergy=13000 --jobConfig=421002 --outputEVNTFile=test.EVNT.pool.root
``````
asetup 21.6.6,AthGeneration
Gen_tf.py --ecmEnergy=13000 --jobConfig=421002 --outputEVNTFile=test.EVNT.pool.root
```Spyros ArgyropoulosSpyros Argyropoulos