As of now we define the following explicit list of banks to persist in the various streams where they are expected.
DETECTOR_RAW_BANK_TYPES={'VPRetinaCluster','VP',# compatibility with data without VP clusters'UT','FTCluster','Rich','Muon','Plume','Calo','EcalPacked',# compatibility with old data, now under "Calo"'HcalPacked',# compatibility with old data, now under "Calo"}
We do not currently persist any banks that have "error" types. These can be generic DAQ error types (where not much can be done) but also detector specific error types such as MuonError. In the muon case, since recently the error banks are used in the decoding to recover the okay links and reduce inefficiency. Not propagating them in HLT2 means that one cannot fully reproduce the decoding offline.
In addition, there are non-error but non-standard types such as FTNZS and others.
A general solution (already used in Allen) would be to only deal with detector IDs on the configuration level and "route" (think UnpackRawEvent) banks according to their source ID. The fine selection on the type can be left to the algorithms, or if needed, the routing can also filter on bank type.
We now have an option for lines to decide what detector raw banks they would like to have. These are selected by bank type. I am afraid setting them by ID at stream level and by type at line level would create some confusion in configuration. Wouldn't be easy to add error bank types to the list? The list you give was meant to be used for MC tests only. For data, they should be set properly per stream.
I don't think we need to allow for much flexibility on the line level. When someone there specifies they want "Muon", they imply that they want "MuonError". Saving just one of the two makes little sense.
I meant at line level, it's easier to define banks by type than detector IDs. I guess I don't understand your general solution only deal with detector IDs on the configuration level.
We can in principle adapt UnpackRawEvent to produce error banks along with the detector data raw banks.
Not sure if this is completely related: we (SciFi) would like to keep the Tell40 error banks, or at least the future result of the FTErrorBankDecoder, in order to analyse the output offline. Ideally, we would write an EventFilter on those errors/counters, so that offline monitoring can focus on healthy events. Would this be made possible by fixing this issue?
@gvouters please post here the full list of error banks that are available for Run 3
The question on the usage of error banks offline for detector studies has been discussed today at DSO and there was no big need seen for normal data taking. However, it can be very useful for particular runs.
@sesen@graven@rmatev would it be possible to produce a configuration (read TCK) where we save all the error banks, which we could use to process specific runs, while not doing it in the default physics configuration?
Let me add a bit of context here. The RawBank::View exists for (at least) three reasons:
in order to be able, in the dataflow, track what algorithms really need, instead of some 'opaque' RawEvent that may or may not contain the RawBanks of the type needed, i.e. a dataflow may formally be correct while still wrong (something asked for a RawEvent in order to get some RawBanks, but then got a RawEvent which does not contain the actual RawBanks it needed)
In run1+run2, we 'scattered' several collections of rawbanks into different raw events, and every single component which needed rawbanks had to figure out the 'right' raw event to use -- which resulted in the horribly complex configuration dictionary indexed with RawEventFormat, and components walking search paths looking for raw events and picking up the first one which happens to contain the right type. Using the view is a cheap way of making sure that the 'consumers' of raw banks would instead get a single, stable TES location from which to obtain their banks (with the name of the type as part of the location) , and thus be invariant (ignorant) under changes to where their raw banks come from, and only have one bit of code (UnpackRawEvent) which has to be configured right, and which contains the relevant logic to decide what to use from where.
because requesting banks of a given type from a RawEvent is not thread safe, and the aim was to 'quarantaine' this bit of thread unsafe code into a single place, with the hope of eventually to get rid of it once everyone is using RawBank::View, and only a single (type) of algorithm has to figure out how to 'fan out' a RawEvent into a collection of RawBank::View so that that algorithm could do the work, instead of RawEvent (which is the reason it is thread unsafe).
So we should maintain the (easy to explain) invariant that a single RawBank::View contains RawBanks of a single type (as breaking this will negate 1), and not start mixing them (as then a 'consumer' of such a mixed view would have to start figuring out which elements are of what type -- i.e. this just pushes the problem 'downstream' to all consumers, instead of fixing it up-front, once and for all).
With the above, one may think than algorithm which may need certain types of bank (eg. an error bank) which are not always present is troublesome, as normally it would just query a RawEvent to see if that bank is present. But this is still trivially implemented by requesting the normal UnpackRawEvent algorithm to always create a view for that type - if no bank of that type is present an empty view will be produced, and the 'consumer' which always asks for that type of bank will sometimes just get an empty view. So any decoding algorithm that wants say error banks when they exist should always request the corresponding view (and be ready to check if the corresponding view is empty or not, and act accordingly).
So on the C++ side nothing should change structurally. Now, that leaves the python side, and there one can imagine that besides exposing the 'usual' RawBank types, one can add a view 'aliases' which represent sets of types - and when using those to configure things, they are recognized, and subsequently expanded into a set of types, which is then used to populate the properties of the relevant components.
This is not about decoding but rather raw bank persistency.
Just to add what we discussed with @rmatev At the moment, a stream/line can request a detector type for which we save all bank types listed for that detector Muon: [Muon, MuonError]. However it's very hard to keep track of all types for all detectors (not only error banks but special bank types such as NZS). What would be ideal is to get all banks corresponding to a detector source ID from UnpackRawEvent. This won't be possible if we can't have a RawBank::View with multiple bank types.
Could DataObjectReadHandle be used here ? I use this in a few places to access MCHits for spillover events, which may or may no exist in a given event. Sounds on paper to be a similar use case to these error banks.
The problem is not if they are there or not, as @graven said we can always have empty views if needed. The problem is listing all raw bank types that might be in the raw event for a detector. At the moment, the configuration is
We need to list all possible bank types for each detector. SciFi alone has 6 types but they should all have same detector source ID. So we are wondering if we can select banks based on source ID instead of type as an output data handle and pass it to the raw bank combiner.
I don't see how selecting by SourceID can work, as every tel40 has a unique ID. Do you expect each subsystem to have to list each and every SourceID it expects ???
Or do you actually mean 'select using the sub-system bits inside the SourceID' which isn't quite the same thing ?
What would be ideal is to get all banks corresponding to a detector source ID from UnpackRawEvent. This won't be possible if we can't have a RawBank::View with multiple bank types
As soon as you start mixing various types inside of a RawBank::View, the observable dataflow is no longer reliable, as there are no longer separate TES locations for different types (i.e. different views), and thus no separate data handles -- which just moves the problem into all the consumers of this view (which up to now all assume that a view only contains a single type, and now all of sudden have to be augmented to check for this and deal with the fall-out)
What I do not understand is the perceived problem of mapping detector -> bank types. There only has to be a single dictionary (in python) which does this. Whether any entry has 1,2,5 or 11 entries has no impact on the code complexity, and yes, this table should be updated if a new type of bank is added -- but that does not happen often, and it is an isolated action in a very specific place.
And when anything wants a set of banks, either it gets a list of concrete types, or one of the types is recognized as a placeholder (*) which is thrown at above dictionary and substituted by the corresponding set of the types. The only draw back is the maintenance of this one dictionary, and the fact that it duplicates information which is also known (implicitly) elsewhere in the system. Any alternative I've seen proposed so far only seems (IMHO) to make things more complicated than this.
(*) it would be nice if this was hidden/encapsulated into a dedicated type
For now, it is an excel file. We might want a list in a specific file format in your framework as that list can evolve while taking data. Probably, I'll add 2/3 new DAQ error banks during YETS from what we learned from 2023.
Also, we can discuss about which banks should be always written down and which ones should only when we enable an option.
I see from the definition DETECTOR_RAW_BANKS that it is not well defined. For example EcalPacked and HcalPacked should not be there. I'm also surprised that for VELO, 'VP' is used instead of 'VELO'. Anyway, we should discuss about it carefully.
This dictionary has to be backward compatible also for the simulation samples we have in tests. VP and Hcal/EcalPacked are there simply for that.
About making it more configurable, I might be wrong but I am not sure it's needed. Any special bank type would be only available in special runs that are not used for physics I assume. In a way, the configuration is more at the input data level.
Would it be useful to have something similar ? Or would it be better to have a file in your framework ?
I just need an acces to update it whenever I add bank TYPEs in the LHCb framework.
Better to have a single file that is used both by humans but also the s/w, so I would suggest just using the python dictionary @graven refers to above. To add new types to that you would simply then have to open a MR making your changes, which is trivial to do.
during some special runs, other types of banks need to be persisted to allow the subdetectors to do studies offline
Are the additional banks needed only for special runs typically empty during regular physics data taking? In that case, I understand we could add them to the same (single) dictionary with no extra information saved during routine data taking. If they are expected to be filled also during regular physics data taking, we indeed need a configuration option to tell the persistency which ones to save in each case.
Hi @cmarinbe, your summary is correct !
We just need to check with all SD groups if the current dictionary still fits with what they want for regular RUNs and what would be useful for them to keep during special RUNs.
The error banks are not empty, they contain more information about the error observed. I'm fine with keeping everything whatever the RUN but if you want to save bandwidth, it is better to have an option to choose the dictionary.
Laurent has brought this issue to my attention. It is too bad we were not brought into this discussion earlier. We have known for over a year that we need to decode 'UT' and 'UTError' for good hits to be used in tracking. Our situation appears to be similar to the Muon system.
UTError is a very misleading name for these raw banks. They should be called 'perfectly good hits from UT, in the same format as in "UT" raw bank, with a few extra words added with SALT asics reporting their status, including at least one not being able to send hits'. It is a mistake to think about them as belonging to error monitoring only. Their main function is to send good UT hits in parallel to additional short (!) info on the status of asics. If it was up to me there would be just one raw bank type, but I guess there are two because of firmware considerations. What happens is that if at least one, and it is usually just one, out of many asics served by the same TELL40 has problem sending hits (usually internal buffer is full), the entire Tell40 output format switches from UT to UTError. Discarding these banks would be very ill advised, and for no reason at all would kill entire TELL40 just because one asic has problems.
Frequency of UTErrors depends on status of the detector. They are present in real data (both local and global). Ideally we want it to be zero, and in the past their frequency was low. We are still commissioning and we have not faced high luminosity (SEUs?) and long runs. Their future frequency is unknown. UT stand alone monitoring always decodes UTErrors together with UT for all hit distributions. @hawu has been working on adding these raw banks to decoding in tracking on CPU side (it requires some minor code changes) - hopefully he will comment on the status of this work. @dtou is aware of need to add them to Allen decoding, though we would like his major merge request completed first Allen!1444 (merged) and we will make a separate merge request for adding UTError.
It is urgent that we add UTErrors to types of banks which are saved in streams which preserve raw data. This is about preserving good UT efficiency, not about ability to monitor errors which in fact is a separate info flow, which also uses UTSpecial raw banks, which contain no hits but more error messaging. A different decoder is used to extract errors from UTErrors than the UT hit decoder. The errors are not going to be decoded during decoding for tracking. The same hit decoder works with UT and UTError, thus no additional decoding software is needed except that both bank types have to be passed to the hit decoder.
Hi @decianm, could you please educate me if this requires a change in make_PrStoreUTHit_hits method in hlt2_tracking.py? Currently, it only takes input of "UT" type bank by default, i.e. RawBanks=make_raw("UT"). Does the change in !3263 (merged) actually override this default value?
Nonetheless, it would be good if we can change the default value, and I created Rec!3853 (merged) and !3268 (merged) for this. Please let me know if they are redundant.
This MR does not change anything for the reconstruction, it just defines which raw banks are persisted for offline (if requested). I'll comment more on the other MRs.
Hi @tskwarni, we should try and get UTError handling into Allen!1444 (merged). For Allen, L31-L67 of this code is reading the raw UT fragments (or bytes) into UTRawBank. We have to handle the raw bytes differently for UTError, right? Can you point me to where in the LHCb stack is the C++ code for handling this (and branch if it is not on master), so that I can implement it in Allen?
Hi @dtou , on CPU side UTDecoder.h decodes hits from both UT and UTError banks since the hit structure is the same in both bank types. Here is the link for the master version:
This code does not depend on the bank type which is fed to it. However, there was a problem with bank size checking - the length of UTError can be a bit longer if a very few hits are contained in the hit structure (In "Lanes"). If you do not perform bank length check before decoding then you don't need to worry about this at all. If you check bank size for consistency with the hit length info, then you need to follow the fix implemented by @wokrupa which is in private location here:
The part which changes size checking is lines 77-79.
UTDecoder.h is just an iterator over bank memory. Somebody else gets access to RawEvent and feeds RawBanks to UTDecoder.h. An example of the code which passes both types of banks to the same decoder is here:
and the rest of the code is the same for both bank types.
This is an example of decoding for UT monitoring. An example for decoding to UTHits used by tracking (with position information) is somewhere else, but again processing of both bank types for hit decoding is identical.
As discussed already in some UT merge request, please do NOT use RawEvent as input, but RawBank::View -- or a number of distinct RawBank::View, one for each type of bank, in case several types of RawBank are needed (and if a certain type is not present in the underlying RawEvent, the 'view' will exist, but just be empty).
The problem is that RawEvent may be in different locations depending on the input, and we want to avoid having to configure every consumer of RawEvent with this -- instead, there is one single step which takes a RawEvent and then produces RawBank::View in standardized locations, making sure that the downstream processing can be left invariant.