MC Job Options issueshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues2019-10-06T17:12:35+02:00https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/40Automatic script not picking up links to GRID files2019-10-06T17:12:35+02:00Spyros ArgyropoulosAutomatic script not picking up links to GRID files!60!60Alphahttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/39logParser switch from minEvents to nEventsPerJob2019-10-25T12:20:28+02:00Spyros ArgyropouloslogParser switch from minEvents to nEventsPerJobFrom ewelina:
> starting from 21.6.12 nEventsPerJob should be obligatory (transform recognizes minevents i.e. does not crash when the parameter appears, but assignes no value or action to it).
Todo: give ERROR if minEvents is usedFrom ewelina:
> starting from 21.6.12 nEventsPerJob should be obligatory (transform recognizes minevents i.e. does not crash when the parameter appears, but assignes no value or action to it).
Todo: give ERROR if minEvents is usedAlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/38run_athena job fails in jobs with GRID files2019-10-14T15:12:52+02:00Spyros Argyropoulosrun_athena job fails in jobs with GRID filesMost probably due to an eos access problem
![Screenshot_2019-09-23_at_20.57.13](/uploads/2f0e729dc4a11b89f912b296458e7aba/Screenshot_2019-09-23_at_20.57.13.png)
which is not easily solved by using the eos image from the registry.
It w...Most probably due to an eos access problem
![Screenshot_2019-09-23_at_20.57.13](/uploads/2f0e729dc4a11b89f912b296458e7aba/Screenshot_2019-09-23_at_20.57.13.png)
which is not easily solved by using the eos image from the registry.
It will also most definitely appear when using links from cvmfs...AlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/37Execute scripts from gilab-ci.yml instead of sourcing them2019-10-06T17:12:35+02:00Spyros ArgyropoulosExecute scripts from gilab-ci.yml instead of sourcing themThis will avoid strange differences in the behaviour between zsh and bash which cause some jobs to fail. An example is here: [here](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/5559520)This will avoid strange differences in the behaviour between zsh and bash which cause some jobs to fail. An example is here: [here](https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/jobs/5559520)AlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/36Handling failed pipelines in MR with multiple commits2019-10-14T21:16:27+02:00Spyros ArgyropoulosHandling failed pipelines in MR with multiple commitsThis happened in e.g. !54
If multiple commits are made the first commit can trigger a CI job which fails but in the 2nd commit the failed job might not be triggered (E.g. GRID file only added in first commit and not changed in second)....This happened in e.g. !54
If multiple commits are made the first commit can trigger a CI job which fails but in the 2nd commit the failed job might not be triggered (E.g. GRID file only added in first commit and not changed in second). In this case the overall pipeline status is green, which is not what we want.
See discussions in:
https://gitlab.com/gitlab-org/gitlab-foss/issues/53530
Options that could be tried:
* Try `only:merge_requests` (although this also has problems, see https://gitlab.com/groups/gitlab-org/-/epics/957)
* Remove `only:changes` and launch all jobs for all commits possibly moving the `git diff` commands in `gitlab-ci.yml` so that if nothing is to be checked the job can exit before executing the time consuming scripts AlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/46Allow GRID files to point to other links2019-11-04T18:15:00+01:00Spyros ArgyropoulosAllow GRID files to point to other linksCurrently the CI will check if a GRID file (link) points to eos or cvmfs.
We want to allow the possibility that a link points to another link.
```
while $(test -L $GRID) ; do
GRID=$(readlink $GRID) ;
done
```
Alternatively we can...Currently the CI will check if a GRID file (link) points to eos or cvmfs.
We want to allow the possibility that a link points to another link.
```
while $(test -L $GRID) ; do
GRID=$(readlink $GRID) ;
done
```
Alternatively we can replace `readlink` with `readlink -e` (if we want to test for the existence of the final file) or `readlink -f` (if we don't care about testing that the final file exists). Need to see if `-e/f` are available in the CI image.AlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/31Move from python2 to python32019-10-06T17:12:35+02:00Spyros ArgyropoulosMove from python2 to python3Changes needed in
* [x] `logParser.py`
* [x] `check_jo_consistency.py`
* [x] `commit_new_dsid.sh`Changes needed in
* [x] `logParser.py`
* [x] `check_jo_consistency.py`
* [x] `commit_new_dsid.sh`AlphaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/60Fix behaviour of pipelines when [skip ci] is used2020-01-15T13:14:35+01:00Spyros ArgyropoulosFix behaviour of pipelines when [skip ci] is usedApparently there has been a change in the gitlab policy where, if we enforce that pipelines must succeed there must always be a CI job that runs successfully, so we might have to implement something like https://docs.gitlab.com/ee/user/p...Apparently there has been a change in the gitlab policy where, if we enforce that pipelines must succeed there must always be a CI job that runs successfully, so we might have to implement something like https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html (see Limitations).
https://gitlab.cern.ch/help/user/project/merge_requests/merge_when_pipeline_succeeds.md#only-allow-merge-requests-to-be-merged-if-the-pipeline-succeeds
Associated gitlab issues describing the policy change:
https://gitlab.com/gitlab-org/gitlab/issues/14791
https://gitlab.com/gitlab-org/gitlab-foss/issues/66271
Affects !180 see also !184BetaSpyros ArgyropoulosSpyros Argyropoulos2020-01-17https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/56CI: check that mcgensvc has access to eos directories2019-12-17T18:43:15+01:00Spyros ArgyropoulosCI: check that mcgensvc has access to eos directoriesCurrently in `check_grid_file_atlcvmfs.sh` we are only checking if `atlcvmfs` has access.
We also need to check if `mcgensvc` has access, since we are using the `mcgensvc` credentials when copying `GRID` files in `run_athena.sh`, so if...Currently in `check_grid_file_atlcvmfs.sh` we are only checking if `atlcvmfs` has access.
We also need to check if `mcgensvc` has access, since we are using the `mcgensvc` credentials when copying `GRID` files in `run_athena.sh`, so if `mcgensvc` does not have access to the eos directory the job will fail.
* [x] Add check for mcgensvc in script
* [x] Add tags:cvmvfs to the check_grid_* jobsBetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/52Bug when running athena in CI2019-12-05T10:45:53+01:00Spyros ArgyropoulosBug when running athena in CISo far we thought that the `run_athena` job is connecting to lxplus and running the `run_athena.sh` script over `ssh`. This was actually not the case since the code will immediately exit from the ssh shell and execute `run_athena.sh` loc...So far we thought that the `run_athena` job is connecting to lxplus and running the `run_athena.sh` script over `ssh`. This was actually not the case since the code will immediately exit from the ssh shell and execute `run_athena.sh` locally in the gitlab runner.
There was a suggestion from Frank to try to do this over ssh however as explained below the necessary developments do not easily fit with the CI workflow, so we decided to drop it.
When running athena in the gitlab runner though we are affected by this:
```
09:08:44 Domain[ROOT_All] Error > Access DbDomain UPDATE Domain[ROOT_All] (UNKNOWN) impossible. [ROOT_All]
09:08:44 Domain[ROOT_All] Fatal The requested persistent backend implementation
09:08:44 Domain[ROOT_All] Fatal for the storage type:ROOT_All cannot be loaded.
09:08:44 Are you sure you loaded you loaded the correct DLLs?
09:08:44 ERROR (pool):
09:08:44 POOL> Missing driver DLL.
09:08:44 StreamEVGEN FATAL Standard std::exception is caught
09:08:44 StreamEVGEN ERROR POOL> Missing driver DLL.
09:08:44 Traceback (most recent call last):
09:08:44 File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/AthGeneration/21.6.13/InstallArea/x86_64-slc6-gcc62-opt/jobOptions/AthenaCommon/runbatch.py", line 18, in <module>
09:08:44 theApp.run() # runs until theApp.EvtMax events reached
09:08:44 File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/AthGeneration/21.6.13/InstallArea/x86_64-slc6-gcc62-opt/python/AthenaCommon/AppMgr.py", line 663, in run
09:08:44 sc = self.getHandle()._evtpro.executeRun( nEvt )
09:08:44 Exception: StatusCode IEventProcessor::executeRun(int maxevt) =>
09:08:44 POOL> Missing driver DLL. (C++ exception of type runtime_error)
```
which seems to indicate that something is missing on the runner.
Until we have !106 merged, we could try to see if we can either fix this or somehow handle this check without `run_athena` exiting at this point.
> Alternative from Frank:
`singularity exec -e --no-home -B /cvmfs -B /var /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos6 /bin/bash -- setupATLASandAsetupandGentfScript.sh`BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/51Bug: Handling of jO created with sherpaTarCreator2019-12-25T13:00:59+01:00Spyros ArgyropoulosBug: Handling of jO created with sherpaTarCreatorIn !109 the log.generate file contained the following line:
```
INFO -- this file was created using the PMG sherpaTarCreator. Please skip minEvents test in the gitlab CI.
```
This is not handled by logParser and instead we should have i...In !109 the log.generate file contained the following line:
```
INFO -- this file was created using the PMG sherpaTarCreator. Please skip minEvents test in the gitlab CI.
```
This is not handled by logParser and instead we should have in the file:
```
sherpaTarCreator = True
```
In fact when I run `421305` with
```
Gen_tf.py --jobConfig=421305 --ecmEnergy=13000. --outputEVNTFile=EVNT.pool.root
```
I don't get either of the above lines, so it seems to me that there is a special way of running Gen_tf when using jO created with sherpaTarCreator??? If so we should document this so that we are able to debug and to understand if the CI jobs are affected in any other way.
--
Final suggestion: replace this check for all generators with:
> `t\*nEvtsPerJob/N<12h`
See also #45
* [x] Harmonise print-out in log.generate for jO created with sherpaTarCreator (not needed in final suggestion)
* [x] Document how we need to run a jO that has been created with sherpaTarCreator
* [x] Check if adjustments in CI are needed
* [x] Implement and test above suggestion in/out of CI
* [x] Perhaps `-t` in logParser is not needed any more
@cgutscho @fsiegertBetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/50Bug: `run_athena` job failing because of `check_jo_consistency` checks2019-11-23T15:49:20+01:00Spyros ArgyropoulosBug: `run_athena` job failing because of `check_jo_consistency` checksWhen running in CI `run_athena` gives the following:
```
18:39:53 Shortened traceback (most recent user call last):
18:39:53 File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/AthGeneration/21.6.13/InstallArea/x86_64-slc6-gcc62-opt/jobO...When running in CI `run_athena` gives the following:
```
18:39:53 Shortened traceback (most recent user call last):
18:39:53 File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/AthGeneration/21.6.13/InstallArea/x86_64-slc6-gcc62-opt/jobOptions/EvgenJobTransforms/skel.GENtoEVGEN.py", line 246, in <module>
18:39:53 include(check_jofiles)
18:39:53 File "/cvmfs/atlas.cern.ch/repo/sw/Generators/MC16JobOptions/scripts/check_jo_consistency.py", line 128, in <module>
18:39:53 main()
18:39:53 File "/cvmfs/atlas.cern.ch/repo/sw/Generators/MC16JobOptions/scripts/check_jo_consistency.py", line 123, in main
18:39:53 check_mc15includes(file)
18:39:53 File "/cvmfs/atlas.cern.ch/repo/sw/Generators/MC16JobOptions/scripts/check_jo_consistency.py", line 98, in check_mc15includes
18:39:53 with open(file) as f:
18:39:53 IOError: [Errno 2] No such file or directory: '421xxx/421305/mc.Sh_288_Wmunu_EnhLogPtV.py'
18:39:53 Py:Athena INFO leaving with code 8: "an unknown exception occurred"
```
This is because of /cvmfs/atlas.cern.ch/repo/sw/Generators/MC16JobOptions/scripts/check_jo_consistency.py L123.
`/cvmfs/atlas.cern.ch/repo/sw/Generators/MC16JobOptions/scripts/check_jo_consistency.py` returns in this case:
`421xxx/421999/mc.Sh_228_Wmunu_EnhLogPtV.py` so when running from `421xxx/tmp_421999` the python file can't locate the jo file.BetaSpyros ArgyropoulosChen Pengchen.peng@cern.chSpyros Argyropoulos2019-11-22https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/49Take nEventsPerJob in run_athena.sh from jO instead of commit script2019-11-17T14:46:24+01:00Spyros ArgyropoulosTake nEventsPerJob in run_athena.sh from jO instead of commit script* [x] Move decision of how many events to generate from commit script (https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/commit_new_dsid.sh#L207) to `run_athena.sh` reading from jO
* [x] See if we can/should inc...* [x] Move decision of how many events to generate from commit script (https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/blob/master/scripts/commit_new_dsid.sh#L207) to `run_athena.sh` reading from jO
* [x] See if we can/should increase time-out for `run_athena` job to 2h or 5h (check with CERN IT) (moved to #45)BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/45Improve time estimates, update scripts to allow for 20 new jO max and increas...2019-12-25T12:23:03+01:00Spyros ArgyropoulosImprove time estimates, update scripts to allow for 20 new jO max and increase timeout of CI jobs* [x] When running `logParser` without `-t` the usual time limits should apply
* [x] When running `logParser` with `-t` the time limit should not be checked
* [x] In commit script and in `run_athena` implement check that < 20 jO are s...* [x] When running `logParser` without `-t` the usual time limits should apply
* [x] When running `logParser` with `-t` the time limit should not be checked
* [x] In commit script and in `run_athena` implement check that < 20 jO are submitted
* [x] Increase timeout of jobs to 5h => need to see if this affects the running time of other jobs and perhaps set up a dedicated runner? (**we can't do this we need a dedicated runner**)BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/35Update logParser to handle jO created with sherpaTarCreator2019-10-31T20:49:26+01:00Spyros ArgyropoulosUpdate logParser to handle jO created with sherpaTarCreatorFrom @cgutscho
> log parser complains that the number of events from the log.generate is != to the minevents (which is expected for Sherpa samples that have been prepared with the tarCreator…).
An example is here /afs/cern.ch/user/c...From @cgutscho
> log parser complains that the number of events from the log.generate is != to the minevents (which is expected for Sherpa samples that have been prepared with the tarCreator…).
An example is here /afs/cern.ch/user/c/cgutscho/public/forSpyros/BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/20CI: diff filters checking HEAD instead of master2019-11-11T20:31:33+01:00Spyros ArgyropoulosCI: diff filters checking HEAD instead of masterThe git diff checks the HEAD instead of the master branch. This leads to unexpected failures as described in !13 (same thing happening in !49)
We also need a way to handle the fact that `only:changes` seems to work relevant to the branc...The git diff checks the HEAD instead of the master branch. This leads to unexpected failures as described in !13 (same thing happening in !49)
We also need a way to handle the fact that `only:changes` seems to work relevant to the branch from which a CI job is triggered. Apparently if a user makes multiple commits that will not change files for which a pipeline has previously failed, the corresponding jobs will not run at the final commit and we end up having a successful pipeline just because some jobs were omitted. This seems to have happened in !49BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/15logParser: check that minEvents is not too low and that enough LHE inputs are...2019-12-25T13:00:59+01:00Spyros ArgyropouloslogParser: check that minEvents is not too low and that enough LHE inputs are used* If LHE inputs are used check that the number of files is maximised so that each job runs ~12h
* If no LHE inputs are used check that minEvents is maximised so that each job takes ~12h* If LHE inputs are used check that the number of files is maximised so that each job runs ~12h
* If no LHE inputs are used check that minEvents is maximised so that each job takes ~12hBetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/41Commit script picking up previous changes if run from dsid_USER branch2019-11-04T13:53:45+01:00Spyros ArgyropoulosCommit script picking up previous changes if run from dsid_USER branchSteps to reproduce:
1. Run script to create `dsid_USER_DSID1` which gets merged and remote branch gets deleted **but local copy does not**
2. Run script to create `dsid_USER_DSID2` while still being on `dsid_USER_DSID1` branch
* [...Steps to reproduce:
1. Run script to create `dsid_USER_DSID1` which gets merged and remote branch gets deleted **but local copy does not**
2. Run script to create `dsid_USER_DSID2` while still being on `dsid_USER_DSID1` branch
* [ ] Are the changes to DSID2 propagated correctly, without picking simultaneously the changes from DSID1?
* [ ] Is it enough to `git checkout origin master` in the script before pushing `dsid_USER_DSID2`?BetaSpyros ArgyropoulosSpyros Argyropouloshttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/55Running athena in CI2019-12-05T10:45:52+01:00Spyros ArgyropoulosRunning athena in CI* [x] Remove anything not associated to the changes in the CI (it seems like only `.gitlab-ci.yml` should stay) [**in new branch slc6-atlasos**]
* [x] Probably the correct execution relies on cvmfs being mounted in the gitlab runner. C...* [x] Remove anything not associated to the changes in the CI (it seems like only `.gitlab-ci.yml` should stay) [**in new branch slc6-atlasos**]
* [x] Probably the correct execution relies on cvmfs being mounted in the gitlab runner. Check whether we need to use runners with cvmfs tag https://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/settings/ci_cd (**we do**)
* [x] Check if we need to use `atlas_external_cvmfs` image or if it's enough to use `tags:cvmfs` (**in fact we don't need these, by default cvmfs tagged runners do the jobs**)
* [x] Check whether we need to do anything specific for GRID files that live on eos. Do we need the eos image to run the job? (**Here we need to add:**
before_script: - echo ${K8S_SECRET_SERVICE_PASSWORD} | kinit ${SERVICE_ACCOUNT}@CERN.CH, **eos image is not needed**)
* [x] Build docker image from `atlas_external_cvmfs` and gitlab-registry.cern.ch/atlas-physics/pmg/mcjoboptions:eos_bash (**this is not needed now**)
* [x] Check whether slc6 or cc7 is used (most probably it's the former but good to check - then #33 can be closed) (**Runners use:** CentOS Linux release 7.6.1810 (Core), **so if slc6 is needed then we may use** image:atlas_external_cvmfs or image: atlas/slc6-atlasos)
* [x] Use this new image to test with (**new image not needed**)
* [x] 421001 (simple jO)(**tested with Runners/image: atlas/slc6-atlasos**)
* [x] 421003 (contains GRID file)(**tested with Runners/image: atlas/slc6-atlasos**)
* [x] Make sure that the final solution for merging also solves #52 (**checked with above two jobs with Runners/image: atlas/slc6-atlasos, no such problem exist!**)BetaMukesh KumarMukesh Kumarhttps://gitlab.cern.ch/atlas-physics/pmg/mcjoboptions/-/issues/54CI: compare with origin/master instead of HEAD2019-12-21T15:10:24+01:00Spyros ArgyropoulosCI: compare with origin/master instead of HEADCurrently in many jobs we are comparing with `HEAD` instead of `origin/master` which is the target branch. In most cases this will need to change.
Actually we should do
`git diff origin/master...HEAD`
Currently in many jobs we are comparing with `HEAD` instead of `origin/master` which is the target branch. In most cases this will need to change.
Actually we should do
`git diff origin/master...HEAD`
BetaSpyros ArgyropoulosSpyros Argyropoulos