WIP UTHitHandler - Improved CPU performance from better memory model
Optimisation of UTHitHandler that results in a factor 4 speed up of PrStoreUTHit in MiniBrunel.
Running perf on MiniBrunel I noticed PrStoreUTHit had very poor characteristics.
12.27% 0.12% python libPrAlgorithms.so [.]PrStoreUTHit::operator()
- 12.15% PrStoreUTHit::operator()
+ 4.02% page_fault
3.31% PrStoreUTHit::decodeBanks
2.01% retint_userspace_restore_args
1.90% error_swapgs
Note the fact that decodeBanks only accounts for about 1/4 the total time of the operator().
The rest in fact is coming purely from the allocation of
UT::HitHandler hitHandler;
The reason being internally it is allocating a beast of an object
using HitVector = boost::container::small_vector<UT::Hit,10>;
using HitsInRegion = std::array<HitVector,98>;
using HitsInLayer = std::array<HitsInRegion,3>;
using HitsInStation = std::array<HitsInLayer,2>;
using HitsInUT = std::array<HitsInStation,2>;
std::unique_ptr<HitsInUT> m_allhits{new HitsInUT};
which had very poor performance.
This MR changes the vector type back to a basic STL container
using HitVector = std::vector< UT::Hit >;
which then is only allocated (reserved) when it is known at least one hit in it is required.
CPU performance is pretty much exactly a factor 4 better. Logs attached for full mini brunel sequence.
Edited by Christopher Rob Jones