Track all library versions
Currently, Darwin tracks a fixed set of versions:
std::map<std::string, std::string> MetaInfo::versions = {
{"Darwin" , DARWIN_VERSION},
{"gpp" , Form("%d.%d.%d", __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__)},
{"Cpp" , Form("C++%ld", __cplusplus/100-2000)},
{"ROOT" , gROOT->GetVersion()},
{"Boost" , BOOST_LIB_VERSION},
{"libgit2", libgit2_version()}
};
In addition, the version of the repo from which the last executable was compiled is tracked in a different place. It is clear by now that this is not enough. We likely also want to enable version tracking for Eigen, correctionlib, Torch...
My goal when writing this issue is to require the user to provide version information of any loaded library they may have compiled themselves. The main idea is:
- Enable users to provide version information as easily as possible
- Determine at runtime the list of all loaded libraries
- Record the version of all of them
- Don't set the reproducible flag when some info is missing
The basic structure I propose to use is as follows:
struct Version {
using VersionData = std::variant<std::string_view, std::string (*)(), const char* (*)()>;
/// The name to use in the metainfo, e.g. `"libgit2"`.
std::string_view name;
/// A regex for file names, e.g. `"libCore.*"`. Empty for libraries not linked to a file such as compiler versions.
std::string_view regex;
/// The version string or a function to extract it, e.g. `BOOST_LIB_VERSION` or `&libgit2_version`
VersionData version;
};
constexpr static std::array<Version> builtInVersions = {
{"Boost"sv, "libboost_.*"sv, BOOST_LIB_VERSION},
{"ROOT"sv, "libCore|libMath|libHist|..."sv,
[] { return gROOT->GetVersion(); },
// etc.
};
I use std::string_view
to avoid any initialization overhead. VersionData
is designed to be flexible in the way the version is provided.
We would ship predefined Version
structures in a header distributed alongside Darwin. These would be picked up automatically by Darwin. In addition, we would provide an interface to let users add Version
objects for their own dependencies.
Darwin would perform version checks at runtime. It would work as follows:
- Get a list of loaded shared libraries for the current process, using
/dev/self/maps
on Linux and the Mach kernel API on macOS. - Match them to provided version info and fill the metainfo. Collect unmatched libraries for the next step.
- Ignore unmatched libs from known slow read-only locations, mainly
/cvmfs
. - If any lib is left at this stage, it's a user-provided library for which we don't have version information. We assume that this is relevant for the physics and set the reproducible flag to false.
Caveats:
- There is no way to check if header-only or statically linked libraries are used. We need to trust the user on those.
- Ignoring read-only locations means that it won't work when executing code from a container (whose filesystem is by construction read-only). Specific handling will be needed if and when this becomes a supported use case.