Add minimum momentum cut in hybrid seeding

In short, I see about a factor 4 speed up in the seeding. The end result in terms of the stats used for the calibration is essentially unchanged, as expected.

The cut I am using is 10GeV. The impact could be even higher in the alignment (@jreich @pnaik @amarshal ) which use much tighter cuts.

Excellent! Are you interested in further speed-ups? For instance, can you try running with

L0_tolHp = [280, 540, 0]

?

I can try, but the seeding is no longer really dominant.

 | "PrKalmanFilterForward_944e4f66"                |     2.14519e+06 |        1912.032 |          891.310 |
 | "PrKalmanFilterMatch_5577cfdb"                  |     2.14519e+06 |        1908.046 |          889.452 |
 | "PrStoreSciFiHits_3cf8a836"                     |     2.14519e+06 |        1194.192 |          556.683 |
 | "FTRawBankDecoder"                              |     2.14519e+06 |        1125.662 |          524.737 |
 | "RichPhotonRecoLong_5c59f803"                   |     2.14519e+06 |         978.414 |          456.096 |
 | "PrHybridSeeding_f8d90f25"                      |     2.14519e+06 |         970.347 |          452.335 |
 | "RichPredPixelSignalLong_de24959d"              |     2.14519e+06 |         871.553 |          406.282 |
 | "RichSIMDPixels_4c6cdf50"                       |     2.14519e+06 |         698.652 |          325.682 |
 | "RichRawDecoder_6bf9d1d9"                       |     2.14519e+06 |         629.969 |          293.665 |
 | "VeloRetinaClusterTrackingSIMD_ceb40d92"        |     2.14519e+06 |         625.379 |          291.526 |
 | "PrForwardTrackingVelo_e05c8dc1"                |     2.14519e+06 |         620.011 |          289.023 |
 | "RichMassConesLong_cbc74925"                    |     2.14519e+06 |         390.147 |          181.870 |
 | "RichPixelClustering_1b99ed52"                  |     2.14519e+06 |         207.254 |           96.613 |
 | "RichTrackSegmentsLong_c41400dc"                |     2.14519e+06 |         138.636 |           64.626 |
 | "TBTCMatch_a0b610d8"                            |     2.14519e+06 |         101.735 |           47.424 |
<snip>

If I want to improve further, it would be better to focus on e.g. the track fit, and then PrStoreSciFiHits and FTRawBankDecoder. How much CPU optimisation has been done for these last two ;) ?

Actually... The above times could be biased. I run on a data set which has a lot of run changes, way more than normal, and if these algs have to re-cache conditions each time this could really bias things...

the last two were never a bottleneck and are thus not heavily optimised :) the track fit has no low-hanging fruits I am afraid

Yeah, I figured the track fit had seen a lot of effort so there was likely nothing easy to gain there, but wasn’t sure about the other two. Good to hear that perhaps with a bit of profiling some gains might be possible there.

B.t.w. I ran a local test over a big sample from a single run, so without all the run changes. The results where not massively different to above so PrStoreSciFiHits and FTRawBankDecoder where still two of the main cpu contributors.

FTRawBankDecoder was supposed to be relatively optimised, although there is 'now' a sorting function that could actually be done another way. PrStoreSciFiHits is very non-optimal though.

/ci-test

[2023-05-16 21:04] Validation started with lhcb-master-mr#7882

added hlt2-throughput-increased label

approved this merge request

Add minimum momentum cut in hybrid seeding

Merged by Sebastien Ponce 1 year ago (May 17, 2023 9:04am UTC) 1 year ago

Activity

Add minimum momentum cut in hybrid seeding

Merge request reports

Merged by Sebastien Ponce 1 year ago (May 17, 2023 9:04am UTC) 1 year ago

Activity