Add a backend with fully compiled code (standalone executable) (!148) · Merge requests · cms-analysis / General / bamboo

GitLab now enforces expiry dates on tokens that originally had no set expiration date. Those tokens were given an expiration date of one year later. Please review your personal access tokens, project access tokens and group access tokens to ensure you are aware of upcoming expirations. Administrators of GitLab can find more information on how to identify and mitigate interruption, please take a look at our documentation.

Pieter David requested to merge piedavid/bamboo:compiledbackend into master Oct 19, 2020

As suggested in https://gitlab.cern.ch/cp3-cms/bamboo/-/issues/64 and the discussion in https://indico.cern.ch/event/963454/ .

Current status: very early still, but the following simple example works (without on-demand corrections - and too minimal for benchmarking, unfortunately), so most of the basic parts should be there.

bambooRun -v --module=examples/nanozmumu.py:MinimalNanoZMuMu bamboo/examples/test1.yml --output=test_out/nanozmm_test2 --backend=compiled --threads=2

Assuming no other big issues are found, the main remaining challenge is getting C++ extensions included correctly as well. For the jet/met and rochester correction the methods used to configure them already get a reference to the backend, so there I can make sure anything necessary gets passed (also configuration). For the MVA libraries and user-added things, the most straightforward seems to be to ask the user to call a loadXYZ(be) method from prepareTree (which gets a backend reference from the base class method - definePlots does not have this by design, but forceDefine breaks this rule, so it's possible to work around that if really needed); for user extensions, this could be a load method in bamboo.root that takes keyword arguments for an include path, dynamic library path, headers, and shared libraries (if needed also lines of C++ code for initialisation). This is just a first idea, so opinions and feedback are most welcome.

Backwards-incompatible changes:

NanoAODModule.prepareTree now takes a backend string argument ("lazy", "compiled" - otherwise the default), instead of a lazyBackend boolean
installation of the non-python parts (headers, shared libraries, cmake modules) works differently (using scikit-build, a small standalone package, to call cmake, and putting everything in the bamboo python package directory). This has two consequences: the safest is to recreate the virtualenv when updating, and python setup.py build does not work anymore to get a fully working development install; pip install -e does, though, and it's more efficient and stable than before, so not a huge loss (the absence of scikit-build in the CI is what broke the related test, the easiest is probably to remove that option).
as a consequence of the above, pip builds in a isolated environment, so picking up pytorch from the LCG distribution does not work as before. The two options are forcing a non-isolated build, or picking it up as an optional dependency (see the documentation).

Remaining:

Afterwards:

separate compilation (for all samples, or better: unique graphs) from processing step (running the former locally and the latter distributed, or compiling in slurm jobs with a higher memory limit)

Edited May 31, 2021 by Pieter David

Admin message

Add a backend with fully compiled code (standalone executable)

Merge request reports