Tupling trigger and event information in FunTuple with ThOr functors

added DPA-WP3 Functors ThOr Tuples labels

This seems an important enough design discussion that would be work having in the DPA planning, such as https://gitlab.cern.ch/lhcb-dpa/project/-/issues/121. But fine to have it here and I linked to https://gitlab.cern.ch/lhcb-dpa/project/-/issues/121 already.

I think the space is multi-dimensional and I would not claim I have made my mind just yet. Let's think a bit and discuss viva voce on Friday, as it may be more effective. (Does not prevent comments and lists of use-cases to be added here, obviously.)

Thanks for writing this up @amathad !

Let me spell out one point in a bit more (probably tedious and pedantic) detail: "which trigger lines fired" is as a question completely decoupled from knowledge about signal candidates but TISTOS information isn't. So any functor which uses the SelReports to TISTOS by construction needs to know

What is your offline signal candidate
Where can it find the SelReports

A functor which stores trigger decisions on the other hand only needs to know where to find DecReports. This is logically the same as any global event info functor (event number, run number, etc).

TISTOS studies have always been a special beast. I'm now wondering off the top of my head if such studies wouldn't be easier to "prepare" in Sprucing jobs, and the ntupling in DaVinci would then be a simpler job based on what gets stored (and related) from that Sprucing "prep job". But I admit I need to massage this fresh thought properly ... I thought I would share it anyway, just in case ...

I think the idea of centrally preparing TISTOS information during sprucing jobs is a very interesting one and worth pursuing further.

Moving TISTOS to sprucing does mean that there needs to be thought on the flexibility of the TOS definitions still available downstream; in Run 1/2 we could, and ~~sometimes did~~ regularly did*, change the TOS detector fraction in the offline processing. I'm not sure you can get a solution easier than reading the reports.

*: looking at e.g. analysis productions

@ldufour, if you are worried about flexibility or missing info then the problem is only less or equal in Sprucing compared to DaVinci. In other words, having aside the work one would be doing in a special Sprucing job (work has to be somewhere, anyway), things can only be easier than trying to do various things when tupling in DaVinci.

I don't think it will be easier: these settings typically are developed over time in an analysis and really benefit from the very fast turnaround in the user processing. For the sake of offline analysis and physics output, I would avoid having more reasons for a resprucing and holding up analyses.

I don't see why this is an either/or situation? Having the ability to centrally prepare it doesn't stop analysts who need to do something special from doing it later, it just means there is a safe and well tested baseline for everyone else.

My earlier comment is exactly that: if one moves the preparation of data needed for TISTOS to the central production, this preparation must keep the functionality for "fractional TISTOS information" at the DaVinci level still available (and the functionality there, as it's expected to be used).

Exactly. And do not forget that we have signed for fast turn-around with analysis productions as well (if relevant - Castelao jobs have been run IIRC) ... I'm not saying I have the full picture of things (I don't .. but then who does ;-)?) but I often feel too much thinking in terms of what we did in the past decade. While we did loads of things really well, our base paradigms have been evolving, and we have to think outside the box as much as possible.

Anyway, the discussions just prove that we need some live chats at our joint meetings as there's a lot to benefit from those :).

... this preparation must keep the functionality for "fractional TISTOS information" at the DaVinci level still available

Fine. Just do it :-). There is no technical show-stopper and the prep job should be what experts needs. That's obvious, no? Am not saying the opposite in any way. My point is that we should consider a different workflow and then decide what is best based on concrete on-paper designs.

Indeed, TisTos is s special beast. Let's concentrate here on candidate-indepent functors. @gligorov mentioned elsewhere event and run number, which should always be there, but there is also gpsTime, detector multiplicity, polarity (actually coming from the conditions database), TCK, etc.

In the DecayTreeTuple suite we had three kinds of TupleTools: IParticleTupleTool, IMCPartcleTupleTool and IEventTupleTool, which does not take any input (TES locations were passed by options). So here we are discussing a replacement for that.

It's not only a FunTuple issue. Whatever can be plotted can also be cut on and will be used in selections. Typically detector multiplicities.

I did a small exercise yesterday and implemented the use of IEventTupleTool in FunTuple. It was relatively easy as the interface only requires a pointer to the tuple as you say and it was quite obvious where to fill the tuple.

Even if that is not the direction of travel, I think it highlighted already some questions and features which have to be addressed in the configuration. E.g. how are TES inputs provided to FunctorCollections and should the user take care of that. There basically need to be functions to get things like ODIN or functions which return the TES locations of DecReports are now in DaVinci but collections are defined in Analysis.

Speaking for myself I reckon it would be interesting and information to see that proof-of-concept exercise. Either share here or link to a branch or even prepare 1-2 tech slides for Friday or the following joint meeting?

(@nskidmor, cc'ing you in case you're missing these discussions.)

@rowina because she asked

No problem, here are the branches

Analysis@master...sstahl_testing

DaVinci@master...sstahl_testing

MooreAnalysis@master...sstahl_testing

The diff is bigger because I also tried to include a IParticleTupleTool in FunTuple.

On the question regarding functorcollections. The data dependency on functor would natural introduce data dependency on functorcollections. I think @gunther introduced such a thing for truth functors (Analysis!846 (merged)).  

Yes, it's more about what could be simplified in the user script.

Now a user has to do

mctruth = ConfiguredMCTruthAndBkgCatAlg(inputs=d02kpi_data)
mc_functors= functorcollections.MCKinematics(mctruth)

while it might be better if one could do

mc_functors = functorcollections.MCKinematics(d02kpi_data)

Then for something like getting the odin raw bank or the trigger reports which have almost always the same location, it might be good to be able to do

@configurable
def fill_event_info(make_odin=make_default_odin):
    return event_branches

@configurable
def fill_hlt1_trigger_info(trigger_list, make_decreports=make_hlt1_dec_reports):
    return trigger_branches

Some of these things are not possible because some functions are in DaVinci and can therefore not be imported in Analysis.

Remember we wanted to move the algorithm (in this instance mctruth) into functorcollections, but functorcollections would have to output both mc_functors and mctruth, where the latter gets appended to the "sequence", but now we can exactly do the suggested thanks to you and Niklas's recent fix to PyConf (LHCb!3391 (merged)).

Indeed, once the design/workflow of how we tuple the trigger info is fixed, we can think how the corresponding collection can built.

Thanks @sstahl for the links. Yes, not a production-style change but an exercice.

Some of these things are not possible because some functions are in DaVinci and can therefore not be imported in Analysis.

Great that more people are seeing this. You may remember that DPA has temporarily various functions and standard particle definitions copied over from Moore. But we always said that's not meant to stay as it will be a nightmare for bookkeeping. Recently we created Moore#372 (closed) to move in the direction of less duplication ... Yes, agreed that (at least IMO) plenty of filter/combiner helpers and alike should be moved upstream, most likely to Analysis.

I actually like the syntax

mctruth = ConfiguredMCTruthAndBkgCatAlg(inputs=d02kpi_data)
mc_functors= functorcollections.MCKinematics(mctruth)

because it is explicit on what one needs, and remains fully flexible on how a user does MC association - standard or dedicated, "cooked up" by the user. With your suggestion

mc_functors = functorcollections.MCKinematics(d02kpi_data)

one totally looses the possibility to specify to the functor collection how the association is to be set. It effectively couples the variables that get tupled to an input location, with no control over how to MC-associate. This buys one line of code (instantiation of mctruth) but is a price to pay for more obscure happenings in the back, since a functor collection would have to set up and run associators internally. I like functional and declarative programming but this seems like a step too far. (But point taken that it looks attractive.)

one totally looses the possibility to specify to the functor collection how the association is to be set.

For the few(?) use cases where a different MC association is needed, one could still use @configurable and bind to change the behaviour. I just wanted to point it out so that you have something to think about.

changed the description

mentioned in issue #245 (closed)

Sorry I really wanted to discuss this issue in today's meeting as its important to get it in!

So what is the conclusion? Build an upfront tool that builds a map or linker table e.g. of candidate -> ODIN, then pass F.MAP_INPUT(F.ODIN, “P2ODIN”) functor into FunTuple? (straight-foward option perhaps?)  

Or a different tool for event info and then join offline event level root file and candidate level root file?

I am a bit confused…

FYI: @pkoppenb , @graven , @gligorov , @sstahl

Or a different tool for event info and then join offline event level root file and candidate level root file?

I would not know what the problem of the Run1/2 way was. The event info variables are usually a small part of the full set of branches.

Build an upfront tool that builds a map or linker table.

This seems like overkill to me. The ODIN bank already provides easy access to the information.

I agree with @sstahl

Thanks...The run1/2 way I think is equivalent to Gerard’s suggestion to have a “wrapper” functor e.g. F.EventInfo (calling different functors inside) and returning std::map (takes a candidate as input but does nothing with it).

Some way to switch on or off internally the functors would be good since there are a lot of event-level info.

probably the most trivial (hachky?) way to implement this is to have a functor that given a candidate, or actually 'anything' just uses an embedded data handle to the ODIN bank and returns a reference to it (i.e. a relation that relates anything given to it to the one ODIN bank ;-). And then write a few trivial functors that, given an ODIN bank return eg. run number or event number or tCK or .. -- those latter ones are probably 3 lines of C++, eg:

struct RunNumber : public Function {
    auto operator()( const ODIN& odin ) const { return odin.runNumber(); }
};

-- which already exists actually, in Phys/FunctorCore/include/Functors/TES.h -- and then one should be able to do

F.MAP_INPUT( Functor = F.RUNNUMBER, Relations = JUST_RETURN_ODIN_NO_MATTER_WHAT_I_AM_GIVEN )

where JUST_RETURN_ODIN_NO_MATTER_WHAT_I_AM_GIVEN would use a DataDepWrapper to get ODIN, and ignore its single input.

An alternative would be to make these functors special with their own property and call them once for each candidate. To which branch would you add them anyhow? Then they would take nothing as input. It's basically what I did when I added the IEventTupleTool.

As this piece of the code is intended to be used by all analysts in LHCb, is it not an idea idea to follow what is done normally in software development, and first decide on the user-interface design? After that the technical implementation can follow.

The reason I say this is because it became hard to follow for me what parts are intended to be "user interface" and what parts are "technical implementation details"; as things such as "DataDepWrappers" are probably intended to be wrapped in another function such that the analyst does not have to deal with it directly? (short version of this comment: What is the intended concept option file which has this event tuple information?)

CANDIDATE_RUN_NUMBER = F.MAP_INPUT( Functor = F.RUNNUMBER, Relations = JUST_RETURN_ODIN_NO_MATTER_WHAT_I_AM_GIVEN )

and

struct JustReturnOdinNoMatterWhatIAmGiven {  
   // some magic that is constructed from a string representing a TES location,
   // and on 'prepare' returns a lambda which when invoked with anything returns 
   // LHCb::ODIN const& of the current event
};

are plumbing, and

CANDIDATE_RUN_NUMBER

is porcelain...

actually, F.MAP_INPUT isn't quite the right thing, as it assumes a relations table and not another functor -- instead, this is just simple composition:

  CANDIDATE_RUN_NUMBER = F.RUNNUMBER @ JUST_RETURN_ODIN_NO_MATTER_WHAT_I_AM_GIVEN

I think I'm thinking a bit more basic I mean, what would one add in the functor collection here to have access to eventNumber, runNumber and gpsTime? (I don't want polarity)

@graven that won't work since JUST_RETURN_ODIN_NO_MATTER_WHAT_I_AM_GIVEN will have to take candidate as input and return odin.

@sstahl : Was thinking of adding it to head branch only. I need to understand what "special with their own property" means.

Anyway, I think there are some ideas I can take from this discuss, mull it over and get back.

yes, I know, and it's easy to write that functor using DataDepWrapper...

you would add

    "RUN_NUMBER": F.CANDIDATE_RUN_NUMBER,
    "EVENT_NUMBER" : F.CANDIDATE_EVENT_NUMBER,
    "GPSTIME" : F.CANDIDATE_GPSTIME

Perfect, thanks. Yes, that looks accessible enough to me.

Sorry, I thought that part was sufficiently trivial that it didn't need spelling out ;-)

@sstahl : Was thinking of adding it to head branch only. I need to understand what "special with their own property" means.

I meant something like the following with Jpsi -> mumu

variables = {}
variables['Jpsi'] = {"RUN_NUMBER": F.CANDIDATE_RUN_NUMBER,
                     "PT" : F.PT}
variables['MuPlus'] = {"PT" : F.PT}

or

variables = {}
variables['Event'] = {"RUN_NUMBER": F.RUN_NUMBER,
variables['Jpsi'] = {"PT" : F.PT}
variables['MuPlus'] = {"PT" : F.PT}

In the second case you could add the event variables when looping over the candidates in the event, e.g. here. Then you would not need the fake dependency on particles. But I would understand if special things should be avoided.

@amathad remind me, is the key of a FunctorCollection a prefix for the branch name? So if one does variables['Jpsi'] = {"RUN_NUMBER": F.CANDIDATE_RUN_NUMBER, , the branch is called Jpsi_RUN_NUMBER?

The event info functors are themselves defined to be DataDepWrapper, which will cause problems for composition (I think). And also I am not sure when the composition MR will be in. So have gone for this: !2737 (closed) (can discuss there).   

@sstahl: Suggestion of a special branch “EVENT” is a good one. Achieved it in this MR (DaVinci!654 (merged)), but probably want to hide it from the users (i.e. make it a special branch like "ALL"). Lets discuss there with others.

mentioned in merge request !2737 (closed)

mentioned in merge request Analysis!867 (merged)

mentioned in merge request DaVinci!654 (merged)

mentioned in merge request !2747 (merged)

@pkoppenb : Since this issue was meant for

Indeed, TisTos is s special beast. Let's concentrate here on candidate-indepent functors.

This has been addressed. So closing.

closed

Tupling trigger and event information in FunTuple with ThOr functors

Designs

Child items ...

Activity