Add a backend with fully compiled code (standalone executable) (!148) · Merge requests · cms-analysis / General / bamboo

Pieter David requested to merge piedavid/bamboo:compiledbackend into master Oct 19, 2020

As suggested in https://gitlab.cern.ch/cp3-cms/bamboo/-/issues/64 and the discussion in https://indico.cern.ch/event/963454/ .

Current status: very early still, but the following simple example works (without on-demand corrections - and too minimal for benchmarking, unfortunately), so most of the basic parts should be there.

bambooRun -v --module=examples/nanozmumu.py:MinimalNanoZMuMu bamboo/examples/test1.yml --output=test_out/nanozmm_test2 --backend=compiled --threads=2

Assuming no other big issues are found, the main remaining challenge is getting C++ extensions included correctly as well. For the jet/met and rochester correction the methods used to configure them already get a reference to the backend, so there I can make sure anything necessary gets passed (also configuration). For the MVA libraries and user-added things, the most straightforward seems to be to ask the user to call a loadXYZ(be) method from prepareTree (which gets a backend reference from the base class method - definePlots does not have this by design, but forceDefine breaks this rule, so it's possible to work around that if really needed); for user extensions, this could be a load method in bamboo.root that takes keyword arguments for an include path, dynamic library path, headers, and shared libraries (if needed also lines of C++ code for initialisation). This is just a first idea, so opinions and feedback are most welcome.

Backwards-incompatible changes:

NanoAODModule.prepareTree now takes a backend string argument ("lazy", "compiled" - otherwise the default), instead of a lazyBackend boolean
installation of the non-python parts (headers, shared libraries, cmake modules) works differently (using scikit-build, a small standalone package, to call cmake, and putting everything in the bamboo python package directory). This has two consequences: the safest is to recreate the virtualenv when updating, and python setup.py build does not work anymore to get a fully working development install; pip install -e does, though, and it's more efficient and stable than before, so not a huge loss (the absence of scikit-build in the CI is what broke the related test, the easiest is probably to remove that option).
as a consequence of the above, pip builds in a isolated environment, so picking up pytorch from the LCG distribution does not work as before. The two options are forcing a non-isolated build, or picking it up as an optional dependency (see the documentation).

Remaining:

Afterwards:

separate compilation (for all samples, or better: unique graphs) from processing step (running the former locally and the latter distributed, or compiling in slurm jobs with a higher memory limit)

Edited May 31, 2021 by Pieter David

Add a backend with fully compiled code (standalone executable)

Merge request reports