WIP UTHitHandler - Improved CPU performance from better memory model (!1187) · Merge requests · LHCb / Rec

Christopher Rob Jones requested to merge UTHitHandler-MemoryAllocOpt into master Sep 04, 2018

Optimisation of UTHitHandler that results in a factor 4 speed up of PrStoreUTHit in MiniBrunel.

Running perf on MiniBrunel I noticed PrStoreUTHit had very poor characteristics.

 12.27%  0.12%  python libPrAlgorithms.so [.]PrStoreUTHit::operator()

   - 12.15% PrStoreUTHit::operator()

      + 4.02% page_fault

        3.31% PrStoreUTHit::decodeBanks

        2.01% retint_userspace_restore_args

        1.90% error_swapgs

Note the fact that decodeBanks only accounts for about 1/4 the total time of the operator().

The rest in fact is coming purely from the allocation of

  UT::HitHandler hitHandler;

The reason being internally it is allocating a beast of an object

    using HitVector = boost::container::small_vector<UT::Hit,10>;
    using HitsInRegion = std::array<HitVector,98>;
    using HitsInLayer = std::array<HitsInRegion,3>;
    using HitsInStation = std::array<HitsInLayer,2>;
    using HitsInUT = std::array<HitsInStation,2>; 

    std::unique_ptr<HitsInUT> m_allhits{new HitsInUT};

which had very poor performance.

This MR changes the vector type back to a basic STL container

  using HitVector     = std::vector< UT::Hit >;

which then is only allocated (reserved) when it is known at least one hit in it is required.

CPU performance is pretty much exactly a factor 4 better. Logs attached for full mini brunel sequence.

original.log.bz2 new.log.bz2

Edited Sep 05, 2018 by Christopher Rob Jones

WIP UTHitHandler - Improved CPU performance from better memory model

Merge request reports