Write versions to output directory, check git status (and enforce a policy) before running
A few changes combined:
- use setuptools_scm to track the exact bamboo version (and a bit of setuptools maintenance - most options can be put in setup.cfg, and this seems now recommended as it's a bit simpler)
- write a
version.yml
file to the output directory. When running one of the bamboo examples that looks like this: (PyYAML sorts alphabetically, which hides a bit that theversion
key is the most important - the rest is additional information, mostly about whether the commit/tag has been pushed, and where - this could be reduced, but may be useful to have; when comparing versions only theversion
string is used)
bambooRun_args:
- --module=examples/nanozmumu.py:NanoZMuMu
- examples/test1.yml
- --plotIt=plotIt
- --backend=dataframe
bamboo_version: 0.1.0b4.dev2+g513249f
config_version: &id001
git_common: /home/users/p/d/pdavid/bamboodev/bamboo/.git
is_dirty: false
remote_branches:
- origin/checkandstoregitversions
remotes:
origin:
url: ssh://git@gitlab.cern.ch:7999/piedavid/bamboo.git
url_push: ssh://git@gitlab.cern.ch:7999/piedavid/bamboo.git
upstream:
url: ssh://git@gitlab.cern.ch:7999/cp3-cms/bamboo.git
url_push: ssh://git@gitlab.cern.ch:7999/cp3-cms/bamboo.git
sha1: 261e77c
tag: v0.1.0b3
tag_remotes:
- upstream
untracked_files: []
version: v0.1.0b3-7-g261e77c
module_version: *id001
- added a
--git-policy
bambooRun (and[git] policy
bamboorc) option, to specify how picky to be (testing
will still retrieve the version information, but proceed independently of the outcome - this is the default, so it is an opt-in feature)
Still to do:
-
improve handling of untracked config files and modules (current code assumes they are in a tracked directory, or otherwise in a package; if in the list of untracked files the status is always "dirty", so onlydone (18/05)--git-policy=testing
will pass then; other untracked files will be listed, but ignored for the version information) -
documentation (including some recommendations for analysis packages, e.g.done (18/05)git worktree
and an editable install for the common modules - theversion.yml
above intentionally does not include the worktree, the module and config paths are relative to the repository root) -
decide what to do when overwriting results (notdone (18/05), overwrite but add a flag--onlypost
, really overwriting an output directory). Currently the versions file is left alone, but overwriting that too may be better -
should worker jobs also check for changes? This may become a mess, so I didn't put it for now, but could be done (maybe this would benefit from a "quick version check" that only runs thedone, checks that the version is equal if the policy is different from testinggit describe
to get the version).
On the installed git versions: CentOS7 comes with 1.8.3.1 (which turns eight years old next month), but LCG_99 and higher include a recent one (2.29.2), so that's recommended, but I still added support for pre-2.7 (remote --get-url
) and pre-2.5 (rev-parse --git-common-dir
) versions.
In principle - if the analysis code is committed - the version.yml
should include enough information to create a virtualenv that allows rerunning the exact same version that was used to create an output directory (assuming the ROOT, python, and python dependency versions do not affect the result - should those also be recorded?). If found useful I could add a script for that (this could also be left as a future extension).
Edited by Pieter David