Configure Allen from TCK

assigned to @raaij

@dcampora, do you think it's possible to make the JSON sequence generation available in Python in a stack environment? Without wrapper scripts that mess with the PYTHONPATH? That would simplify creating the metadata and TCKs and would allow the python configuration to be used to configure the production Allen. The JSON representation of the configuration would then fully be an implementation detail.

@nnolte what functionality did we need to add to make this work? Would Gaudi nodes be enough for this?

yeah i think this should be possible. needs some reordering. wdym "would gaudi nodes be enough"?

@dcampora @nnolte Any idea on how to make this happen? It would be beneficial for operational purposes to be able to generate Allen (JSON) configurations (and encoding metadata) outside of the Allen build. Tests that are already in MooreOnline would also benefit.

By Gaudi nodes I meant Gaudi CompositeNodes, but on second thought actually I meant the pyconf representation of LHCb algorithms and arguments. (I had a My name is not ATMOS moment). By looking at Allen's json files, we need:

namespaces of algorithms
Types of algorithms (HostAlgorithm, DeviceAlgorithm, etc.)
Types of datatypes (host, device)
Argument dependencies (for each argument with dependencies, those need to be stored, see eg. https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/velo/consolidate_tracks/include/VeloConsolidateTracks.cuh#L32-55

Afaik all of those are not in LHCb Pyconf Algorithms and arguments, and would be needed if we wanted to be able to do Allen configuration generation.

Wouldn't it be possible with a bit of renaming of modules and different imports to use things side-by-side? If there was a generate_json in the stack build, that would be fine.

I'm on it, see !1017 (merged)

@graven would this work from a metadata perspective? Could you provide some details on how the storage of the JSON in the metadata could be achieved? Or an alternative? Did I forget about any information that is needed by the decoders and which needs to be registered/stored separately?

@dcampora @dovombru I would propose to either:

add a command line option "--parse-sequence-as-json 1" which would trigger the parsing of the JSON configuration from the value of the "--sequence" property, or
modify the "--run-from-json" property to have the value "parse" trigger it.

Any early feedback @rmatev?

The metadata files are just JSON files, indexed by their SHA-1. So the short answer to your question is that the storage of JSON in the filecontent-metadata is achieved by just storing JSON ;-)

Convention (as implemented in the GitANNSvc) is that their 'outermost' data structure is a dictionary, as that makes it possible (and enforces) that any subsequent value has a (descriptive) key. Any lookup is done by specifying a pair of repository, a commit-ish, i.e. (partial) commit SHA-1, tag, ... -- i.e. whatever git rev-parse accepts as a version, but typically just plain 'master', and in addition to the pair, the above dictionary key. Note that filename + revision used in the lookup is computed from the tag, the above (major) key and the SHA-1, and gets set by a property which defaults to "{0}:ann/json/{1:.2}/{1}.json"}; // 0=tag/commit-ish, 1=SHA-1 key, 2=dictionary key i.e. ignoring the dictionary key (as that is in this case specified as part of the payload, allowing for multiple dictionaries to be stored behind a single key).

So concretely, one has typically a file named something like ann/json/05/05483d47.json which then contains:

{"Hlt1SelectionID": {"1": "Hlt1KsToPiPiDecision", "2": "Hlt1TrackMVADecision", "3": "Hlt1TwoTrackMVADecision", "4": "Hlt1TwoTrackKsDecision", "5": "Hlt1SingleHighPtMuonDecision", "6": "Hlt1LowPtMuonDecision", "7": "Hlt1D2KKDecision", "8": "Hlt1D2KPiDecision", "9": "Hlt1D2PiPiDecision", "10": "Hlt1DiMuonHighMassDecision", "11": "Hlt1DiMuonLowMassDecision", "12": "Hlt1DiMuonSoftDecision", "13": "Hlt1LowPtDiMuonDecision", "14": "Hlt1TrackMuonMVADecision", "15": "Hlt1PassthroughDecision", "16": "Hlt1GECPassthroughDecision", "17": "Hlt1NoBeamDecision", "18": "Hlt1BeamOneDecision", "19": "Hlt1BeamTwoDecision", "20": "Hlt1BothBeamsDecision", "21": "Hlt1ODINLumiDecision", "22": "Hlt1ODINNoBiasDecision", "23": "Hlt1VeloMicroBiasDecision", "24": "Hlt1RICH1AlignmentDecision", "25": "Hlt1RICH2AlignmentDecision"}, "InfoID": {}, "version": "0"}

which can then be used to look-up the "Hlt1SelectionID" (and the empty "InfoID") dictionary provided the key "05483d47" is known, i.e. a query basically looks like: give me the table called 'MyTableName' for key "deadbeaf". And this then goes into a service which is configured with a list of (repository,tag) which it walks until it can answer the query.

The 'ann/json/00/000000000.json' format for the file name is, as mentioned, set by property, and could thus be modified for different instances of the service. Or one could write another service to provide a different interface but using the same 'backend' code (eg. because you want to return something else than an integer -> string mapping).

And the key satisfies the following constraint:

git show master:ann/json/05/05248cf4.json | git hash-object --stdin | cut -c-8

should print 05248cf4... i.e. it is (still) a content-addressable database, just like in the past. Except now the payload is plain json, and the key is computed by running git. And in case of a hash collision, a space is appended at the end of the file (until no collision occurs anymore) -- this changes the text representation and thus the sha-1, but since the space is after the closing } it leaves the json payload invariant.

Thanks for the info, that clarifies quite a few things. Is there already a design for how the HLT2 configuration will be stored in the metadata? Parhaps under hlt2 instead of ann? In other words would it make sense to store the HLT1 configurations under "{0}:hlt1/json/{1:.2}/{1}.json" with the same meaning of the format specifiers? With a separate mapping of TCK to hash somewhere in the repository, or the TCK instead of the hash?

mentioned in merge request MooreOnline!158 (merged)

This is in progress in LHCb!4050 (merged), !1149 (merged) and MooreOnline!212 (merged).

mentioned in merge request !1149 (merged)

@rmatev @graven @cburr we should decide what the main TCK repository that is replicated to cvmfs will be; i.e. the source of thruth for TCKs. Please share your thoughts.

@graven and I figured that file-content-metadata is an obvious candidate because that way all information is available from a single repository. I think that would simplify the TCK release procedure.

closed with merge request !1149 (merged)

mentioned in commit bb730666

Configure Allen from TCK

Designs

Child items 0

Activity