Configuration of persisted reconstruction for ion data

assigned to @mveghel

HI @mveghel. Apologies for the wild ping, someone gave me your name . I hope my description is understandable, let me know if you need anything.

No worries! I'll be on it :)

@baudurie for the Velo tracks, you need just forward or also backward?

One thing I should warn about, if we persist fitted Velo tracks, these tracks will be inconsistent with how the PVs are formed, which use unfitted Velo tracks. Persisting both fitted and unfitted Velo tracks seems to me to go too far. So for consistency I would still advice unfitted Velo tracks + persisting VP hits. Should we go with that?

But these tracks are not used for refitting the PVs. At least this is what @baudurie told me explicitly (otherwise I agree it would not make sense). Also, they should be all Velo tracks, and not just tracks used to make PVs (unless this changed lately) (?)

They are all Velo tracks (that was not clear indeed in the above message, adapted it).

@decianm, correct. These tracks are used only as a proxy for multiplicity. @mveghel, ideally both for physics reasons (correlations, anti-correlation etc.).

Okay, but then the tracks that are used to build PVs are not persisted anymore. Something we do currently persist for pp.

Yep got it. Give me just the day to confirm with IFT analysts that we don't need these tracks (we do need their numbers, however, the PVNTracks variable I assume).

@mveghel, thanks for insisting. After discussing with analysts working on multiplicity measurement (cc @oboenteg), they actually would favor your option: VELO tracks used to fit the PV are best for these kinds of analyses.

As I understand, the other VELO tracks (the so-called fitted ?) are not pointing to the vertex for the most part and are probably not needed.

Just to repeat myself, we can't persist VP hits at the moment. Although there is ongoing work towards it, it will take few weeks at least until it's ready. If you intend to refit Velo tracks offline, please add VP raw banks to your lines.

@sesen, ok. Indeed, we might not have weeks... The VP raw banks should be kept after HLT2 regardless. I guess we have a safety net in case something goes bad.

The fitted Velo tracks are just all Velo tracks, but fitted with a Kalman Filter. They are the same tracks as would be saved for the PV, but just fitted (i.e. they point back as much to the PV as the "unfitted" ones).

The only reason not to use the fitted Velo tracks is if you need them to refit the PVs with exactly the same conditions as it is done in HLT2. For anything physics, you probably want the fitted Velo tracks.

thanks @decianm. Not sure if I am more or less confused ... To summarize, the "Unfitted" are tracks used for the PV reco, but not reprocessed by a Kalman filter (or are they superseded by a fitted version ?), while the "fitted" ones are just a better version of the "unfitted" ones. Am I understanding this correctly?

Sort of

The "unfitted tracks" are the tracks that come out of the Velo track finding. They are used to find PVs, and can be fed into a Kalman Filter, which then makes them "fitted" tracks. (I am using the terms "fitted" and "unfitted" a bit loosely here).

The only use for them offline I can think of is PV unbiasing: Assume one of your signal tracks from a displaced decay ended in the PV. You remove that track and refit the PV without it. But then you'd like to create a situation where this PV would have been found without this track in HLT2 in the first place. And in HLT2 the collection of "unfitted" tracks is used to find PVs.

The "fitted" ones are just the "unfitted ones" fitted with a Kalman Filter, meaning you get a proper chi2 value and a proper covariance matrix for them.

Given that you are not interested in PV refitting (if I understood correctly) and want to use these tracks for "physics measurements", I don't see a reason to not use Kalman fitted tracks, as they should offer a better precision.

Thanks for the explanation. Then fitted are better I agree. I checked with the analysts, at the end of the day both options should work anyway, but better to use the best quality tracks

Alright. !2653 (merged) includes just the fitted Velo tracks, so it's done.

added ~48608 hlt2 important labels

As commented elsewhere, I don't think Ttracks have to be removed from the RecSummaryMaker, as it is fine if their properties are stored in there.

Gentle ping to ask if there are any updates on this issue, we can help if needed

Some updates and thoughts:

First of all, I tried to resolve #622 simultaneously, but that turned out to be more involved then thought (circular dependencies within Moore). So I'm dropping that now to get this issue fixed asap.

On a more important note, we had a bit of a discussion in the persistency meeting (WP1) on how to implement this. Because technically we can avoid making a new persistreco version and the implementation would be rather trivial. We save an empty Ttrack container (we already have them for UT related tracks) to avoid reading issues. In addition, we'd save the fitted Velo tracks in the already available Velo track location (currently having unfitted Velo tracks). This avoids a lot of complications and unnecessary complexity. Then we can use the persistreco version for more fundamental changes (going to Selections/SharedObjectContainers e.g.) and these changes would not depend (so much) on specific reconstruction sequences.

This essentially would take us more in the direction of what I originally proposed with 'persistable_locations_version', and leave the actual reco and persistreco config to Moore (with more flexibility). I think this is more appropriate anyway, because take for example the long tracks, while they would be the same location, for PbPb they would be defined differently as we're going to change the seeding for it. So I don't think this is a breach of how we treat these locations anyway.

By the way, regarding the fitted Velo tracks, could you elaborate on why this is needed? (One could also persist the VP hits and fit them offline).

fyi: @decianm @sesen @rmatev

Hi @mveghel. Thanks for the message and looking into it. If you have an easier solution, no problem with me as long as we go the Velo tracks in as you mentioned. The tracks are used for multiplicity/elliptic flow measurements, where we analyze VeloTrack correlations. It is correct that it concerns mostly the analysis stage (HL2/Sprucing does not necessarily need this information in the lines). However, it would be much easier for analysts to have the same output as Monte Carlo for instance. If we can afford to keep this container, I would avoid adding extra complications for analysis, and keep your suggestion as a last resource.

Hope I answered your questions. Let me know otherwise.

@baudurie The question is more why you need Velo tracks to be fitted. Is it not enough to have Velo tracks?

@mveghel Thinking about it a bit more, I think it's clearer and more maintainable to have a special persistreco_version for this. In the long run, it will be easier to document what is in a given persistreco_version than what a given location means for a given data taking.

We don't persist VP hits at the moment. It needs some planning and discussion how to do this consistently and properly for all sub-detectors. But any line can request to save velo raw banks, then redo velo reconstruction.

I would suggest that if feasible (and with the VELO open it should be, since its contribution to the event size is much smaller than normal) then saving VELO raw banks and then redoing the reconstruction has a lot of advantages. The VELO reconstruction is also basically the fastest bit of the reconstruction, so it should be feasible to redo it even without a centralised reprocessing. Apologies if I'm missing an obvious drawback of this approach.

I agree with @sesen: It is much better maintainable in the long run to have a separate list than "hack" it in. I don't think you save much time, but you then always need to remember that for that special processing the container means something different than for the other data taking periods. If you have a different list, this is clear by construction.

I know that you have a "default" version, but if I understood correctly, it just takes the last entry in the dictionary, so you could just put this one in "in the middle".

All raw banks will be saved anyway in HLT2. But I would still write out the Velo tracks that you actually need in the PersistReco list, because it should come at very little cost. If you then want to do something different offline, you still can using the raw banks, so I don't think with this approach you lose anything.

I'll proceed with a new persistreco version then.

But I have to clarify what I mean with unnecessary complexity, I don't mean the implementation, that is easy, what I mean is having a higher number of versions and how to deal with them. Consider for example if we'd reprocess 2023 data with e.g. persistable selections. We'd have to get then two new versions out (one for pp, one for PbPb). What do we do with older versions? Retire and remove the older version? So to me this doesn't make things necessarily clearer for the outsider too.

Hi @mveghel. Any advancement on this issue? I see you still have unanswered questions, hopefully, you get your answer somewhere... For the record, we will start to take collisions today.

We can worry about those things later. !2653 (merged) is ready.

mentioned in merge request LHCb!4295 (merged)

mentioned in merge request !2653 (merged)

closed with merge request LHCb!4295 (merged)

mentioned in commit LHCb@e55b35a5

mentioned in commit 1635ae5b

Configuration of persisted reconstruction for ion data

Designs

Child items 0

Activity