Before the VdM scan we should implement configuration of Allen using the metadata repository and storing the respective TCK in the raw data.
I think the most straightforward way to implement this would be to store the JSON files we currently have in the repository and allow access to them with a TCK. The JSON can then be obtained from the repository already in Python when configuring the production application and passed as a string into the Allen event loop, where a C++ JSON object can be parsed from the string.
Registering a JSON file to the metadata repository should be very straightforward and only need a bit of extra python code to get the HLT1 decision IDs from the JSON and update the "tck" property of the algorithm that creates the DecReports.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
@dcampora, do you think it's possible to make the JSON sequence generation available in Python in a stack environment? Without wrapper scripts that mess with the PYTHONPATH? That would simplify creating the metadata and TCKs and would allow the python configuration to be used to configure the production Allen. The JSON representation of the configuration would then fully be an implementation detail.
@dcampora@nnolte Any idea on how to make this happen? It would be beneficial for operational purposes to be able to generate Allen (JSON) configurations (and encoding metadata) outside of the Allen build. Tests that are already in MooreOnline would also benefit.
By Gaudi nodes I meant Gaudi CompositeNodes, but on second thought actually I meant the pyconf representation of LHCb algorithms and arguments. (I had a My name is not ATMOS moment). By looking at Allen's json files, we need:
namespaces of algorithms
Types of algorithms (HostAlgorithm, DeviceAlgorithm, etc.)
Wouldn't it be possible with a bit of renaming of modules and different imports to use things side-by-side? If there was a generate_json in the stack build, that would be fine.
@graven would this work from a metadata perspective? Could you provide some details on how the storage of the JSON in the metadata could be achieved? Or an alternative? Did I forget about any information that is needed by the decoders and which needs to be registered/stored separately?
add a command line option "--parse-sequence-as-json 1" which would trigger the parsing of the JSON configuration from the value of the "--sequence" property, or
modify the "--run-from-json" property to have the value "parse" trigger it.
The metadata files are just JSON files, indexed by their SHA-1. So the short answer to your question is that the storage of JSON in the filecontent-metadata is achieved by just storing JSON ;-)
Convention (as implemented in the GitANNSvc) is that their 'outermost' data structure is a dictionary, as that makes it possible (and enforces) that any subsequent value has a (descriptive) key. Any lookup is done by specifying a pair of repository, a commit-ish, i.e. (partial) commit SHA-1, tag, ... -- i.e. whatever git rev-parse accepts as a version, but typically just plain 'master', and in addition to the pair, the above dictionary key. Note that filename + revision used in the lookup is computed from the tag, the above (major) key and the SHA-1, and gets set by a property which defaults to "{0}:ann/json/{1:.2}/{1}.json"}; // 0=tag/commit-ish, 1=SHA-1 key, 2=dictionary key
i.e. ignoring the dictionary key (as that is in this case specified as part of the payload, allowing for multiple dictionaries to be stored behind a single key).
So concretely, one has typically a file named something like ann/json/05/05483d47.json which then contains:
which can then be used to look-up the "Hlt1SelectionID" (and the empty "InfoID") dictionary provided the key "05483d47" is known, i.e. a query basically looks like: give me the table called 'MyTableName' for key "deadbeaf". And this then goes into a service which is configured with a list of (repository,tag) which it walks until it can answer the query.
The 'ann/json/00/000000000.json' format for the file name is, as mentioned, set by property, and could thus be modified for different instances of the service. Or one could write another service to provide a different interface but using the same 'backend' code (eg. because you want to return something else than an integer -> string mapping).
And the key satisfies the following constraint:
git show master:ann/json/05/05248cf4.json | git hash-object --stdin | cut -c-8
should print 05248cf4... i.e. it is (still) a content-addressable database, just like in the past. Except now the payload is plain json, and the key is computed by running git. And in case of a hash collision, a space is appended at the end of the file (until no collision occurs anymore) -- this changes the text representation and thus the sha-1, but since the space is after the closing } it leaves the json payload invariant.
Thanks for the info, that clarifies quite a few things. Is there already a design for how the HLT2 configuration will be stored in the metadata? Parhaps under hlt2 instead of ann? In other words would it make sense to store the HLT1 configurations under "{0}:hlt1/json/{1:.2}/{1}.json" with the same meaning of the format specifiers? With a separate mapping of TCK to hash somewhere in the repository, or the TCK instead of the hash?
@rmatev@graven@cburr we should decide what the main TCK repository that is replicated to cvmfs will be; i.e. the source of thruth for TCKs. Please share your thoughts.
@graven and I figured that file-content-metadata is an obvious candidate because that way all information is available from a single repository. I think that would simplify the TCK release procedure.