FPGATrackSim: Create separate data prep algorithm to match FPGA pipeline
At the moment FPGATrackSim consists of one very large "main" algorithm, FPGATrackSimLogicalHitsProcessAlg. This algorithm does a number of different things:
- It loads the input file (either a ROOT wrapper generated by FPGATrackSim or a RDO file, in which case the structures in the wrapper are regenerated here) and processes the input data
- It runs clustering, hit filtering, mapping, and spacepoint creation on the raw hits stored in the input data
- It runs road-finding (pattern recognition) and track-fitting (ambiguity resolution) using a variety of different algorithms
- It does some monitoring of data flow and efficiencies, including preparing several bespoke output formats (most importantly, the "Hough ROOT output" used as an input to some of our machine learning algorithms)
- It can include a second stage processing step, in which unused layers in the "first stage" are added to the tracks and some of the track-finding steps rerun. Though this is vestigial and does not currently work.
- It writes out a ROOT output file containing "input" and "output" header structures for each event in a TTree.
This is a lot of stuff. We agreed that we want to split this up into multiple algorithms, and one natural way to do that is to carve the first two steps out into a separate "data preparation algorithm" that runs on RDO and outputs clustered/filtered hits (and spacepoints). This roughly matches the "FPGA Data Preparation Pipeline" being considered as one possible EF Tracking solution-- so by putting an algorithm boundary here it makes the simulation look a lot more like the real firmware.
The logical hits processing algorihtm now can take filtered/processed hits as input and only needs to do road-finding and track-fitting (and some of the monitoring steps). The second stage processing can then be a third algorithm (though I haven't added that yet). It then only needs to output roads and tracks (and some of the Hough ROOT stuff).
This merge request implements this refactoring. I basically split LogicalHitsProcessAlg in half, cleaned some of it up, and moved the first steps into a new algorithm. Both configurations can be run independently; python -m FPGATrackSimConfTools.FPGATrackSimDataPrepConfig
will only do the data preparation step and python -m FPGATrackSimConfTools.FPGATrackSimAnalysisConfig
will do both steps. The algorithms communicate with each other through StoreGate, which we were already using to send information to the monitoring/xAOD interface algorithms, so a lot of that was already in place-- it just required minor modifications. Notably, the "event header" objects are discarded after the data preparation algorithm and the LogicalHitsProcessAlg no longer has direct access to the header structure.
There are a few limitations right now though that should ideally be cleaned up before we merge this, so I'll leave it as a draft for now, but I thought I'd open it a little early for visibility.
- I deleted a lot of the second stage stuff and have not yet moved it into a new FPGATrackSimSecondStageAlg algorithm-- I'd like to do that so we at least still have the code around, even if we don't get it working, before merging this in. A future MR can focus on making that actually functional.
- The two algorithms use separate instances of the output header tool and so create two separate output files, with two separate trees, one with the input header branches and one with the output header branch. I need to think about the best way to fix this-- it will be a problem if, for instance, we make another instance of the output header again for the SecondStageAlg.
- Because LogicalHitsProcessAlg is communicating through StoreGate with the DataPrepAlg, we only have access to whatever information we get sent from StoreGate. Therefore, to do stuff like efficiency monitoring or run the Hough ROOT output tool here, we need to push truth and offline tracks through StoreGate as well. I added those, but I did not add clusters, which were used by the "output text file" option and the data flow tools. It sounded like we may no longer want those, so I just deleted the output text file option (I'm not aware of anyone using it, and in principle we can recreate this downstream as a separate algorithm) and commented out the data flow tool.
- It's somewhat silly, but I noticed in the Python we consistenly refer to the LogicalHitsProcessAlg as the FPGATrackSimLogicalHist algorithm-- perhaps this is a good opportunity to clean that up.
Tagging @jahreda and @tbold, as well as @imaznas, @piazza, @sabidi, @kesedlac, and also @wcastigl since this MR will interfere with !73952 (as I moved at least some of the code that MR touches to a new algorithm).