Minimize OH cluster latency
-
Modify cluster sorting to take advantage of the inputs being 4 pre-sorted lists -
Use the "fast" bitslip module (-1.0bx) - Timing was too tight in the default firmware, but if the "any alignment" clusterizer is used then the s-bits could be registered on the 160MHz clock, so -0.75 bx effective improvement
-
Allow the clusterizer to take S-bits with any alignment relative to the LHC clock - Right now the fw assumes the inputs are synchronous to clock 40, so the inputs are registered on the 40 MHz clock before going into the clusterizer. If the firmware accepts a programmable offset (it could even by dynamic without any real penalty), then the S-bits could come into the clusterizer with minimal latency, so that, for example, the S-bit remapping could be done in a 160MHz clock cycle instead of a 40MHz clock cycle
- -0.75bx
-
Drop the DRU for the S-bits as you already proposed (would require manual phase scan) Edit: not in the short term due to the differential lines delay calibration requirements. - Need to investigate latency gain
-
Remove the "reverse partition" option (while it seemed useful, it was taken into account while producing the OTMB LUT) Edit: Actually, the option is still needed for some OTMB integration tests in b904. Considering the limited use, the option can be defined at build time. - -0.25bx
-
Optimize clock domain crossing from 160 MHz clock.. it ideally should not require a transition from 160 --> 4- --> 200 MHz. The issue is that the setup & hold times depend on which cycle the two clocks are in, but it should be possible to monitor those, transition on the edges with good setup and hold times, and set the maximum delays in that CDC path manually. Right now the 40 MHz transition eats up a lot of latency, since the latency gets rounded up to the nearest whole number of bx instead of being fractional
Another to-do:
-
document the latency through the firmware and keep it up to date
(copied comments from below into the top level so Gitlab tracks them)