Improvements to forward triplet search
This MR adds a linear search and optimizes the forward triplet search algorithm a bit.
-
linear_search
is a new algorithm that takes a similar input tobinary_search_leftmost
. It accepts an optional starting index, from which the array is sought in either increasing or decreasing order as required. It outperforms binary search in specific circumstances.
lf_triplet_seeding
got the following improvements:
- It now uses a shared memory array
shared_xs
, which holds all x coordinate values of the three modules in consideration. - It additionally uses
shared_indices
andshared_number_of_elements
to store x0 and x2 indices that passed the cut. - Adding the hit in the middle module
x1
is done afterwards by iteratingshared_indices
, improving thread usage. -
syncwarp
instead ofsyncthread
are used. It is more technically correct, although it doesn't seem to have an impact on performance.
Physics efficiency is expected to improve ever-so-slightly due to the better load balancing of candidate analysis between threads.
- Throughput change: 162.35 kHz -> 165.71 kHz (1.02x)
- Physics change: https://gitlab.cern.ch/lhcb/Allen/-/jobs/9592025
Built on top of !414 (merged)
Edited by Roel Aaij
Merge request reports
Activity
Filter activity
added enhancement label
mentioned in issue Moore#203 (closed)
mentioned in issue Moore#205 (closed)
mentioned in issue Moore#206 (closed)
unassigned @mstahl
Looks okay after a first check, but it will be easier to assess the changes once !414 (merged) is merged.
mentioned in issue Moore#212 (closed)
mentioned in issue Moore#213 (closed)
added 129 commits
-
38da5a63...457e721c - 60 commits from branch
master
- 22b531f1 - Use CI to run VELO tests.
- 5a5e9a2d - Remove shared library.
- d608ea8f - Run hlt1 pp default.
- d1316580 - Better half code.
- 29bba5b9 - Updated copyright.
- 9880f33e - Updated to compile with ROOT.
- 20cdf49d - Commented out mep test.
- fa3be5e4 - Reverted change to Debug build.
- f8f2c107 - Run tests on hlt1_pp_default.
- a3dfc4fa - Added support for vectorization and dispatching.
- 0850d009 - Default to fast_atan2 function.
- f7d7adfc - Changed default backend to float.
- 21bf6b07 - First version that compiles.
- 06069041 - Stick to fastest version.
- 1109578d - Removed umesimd external.
- 160faa80 - Added umesimd submodule.
- 96fd8467 - Added hopefully submodules to CI.
- 82cf1c41 - Put ifdefs around deprecated copy warning.
- c334169c - Fixed formatting
- d65de97b - Use submodule environment only in build stage.
- d666133d - Working out vectorization of sbt.
- 06049538 - Better vectorization of calculate phi and sort and sbt.
- f14cc70e - Updated default version on CUDA and CPU to be more efficient both in physics and throughput.
- 60eaaf6c - Added const auto statements to avoid undefined in device code errors.
- 49030873 - Test throughput of hlt1_pp_default sequence.
- fba000e2 - Fixed formatting
- 4dd47df2 - Focus on VELO only.
- 9b6699fa - Cleaner code in hot section for vectorized version.
- 7a64b461 - Tested logic of Search by triplet vectorized segment.
- fdd62427 - Rewrite of logic.
- 79cecaaa - Default to hlt1 pp default tests.
- 5a172427 - Some changes for CUDACLANG.
- ded7e7bf - Make HIP compilation work.
- edb378cc - Fixed formatting
- 69e8548d - Made CPUBackend a cpp again.
- 39f587a2 - Converted all backend sources back to cpp.
- cc6e83f3 - Default to broadwell (instead of ivybridge) for CPU compilation. Use latest available compilers.
- c2632d6d - Cover includes of x86intrin in x86-64 macro definition.
- 5f8179af - Cover CPUID include in define.
- c30f77d6 - Updated throughput scaling plots.
- 5e93fa8e - Updated scripts to perform scans.
- 9b6b7aa5 - Updated scalability scripts.
- 112a96e3 - Updated script.
- cad5040d - Enable back MEP test.
- 5080db1e - Fixed MEP test.
- 012385af - Refactored sbt to be a single function for seeding.
- bd8fb9ec - Enable back compiling Allen as a shared library.
- b9483c1d - Disabled Tesla T4 tests.
- 4a14e0d7 - Updated efficiencies in reference files.
- c791c270 - Apply 1 suggestion(s) to 1 file(s)
- 0298c370 - Apply 1 suggestion(s) to 1 file(s)
- 271121f0 - Apply 1 suggestion(s) to 1 file(s)
- ebc0aae3 - Apply 1 suggestion(s) to 1 file(s)
- d74d47a8 - Fixed compilation after applying suggestions and reverted CMakeLists to proper HIP configuration.
- 32ed37d9 - Documented throughput scaling scripts. Removed VeloTools unnecessary comment....
- ac712bae - Fixed formatting
- 3b5e40fb - Added hack to not bother with AVX512 on gcc-8 onwards.
- 843f064a - Refactored cuda::span into Allen::device::span.
- af1b57aa - Moved around definitions in sbt to be more readable.
- 90191f71 - Removed common interface.
- c9027799 - Add back __AVX512F__ after inclusion of Vector.h
- 25f4b016 - Fixed copyright.
- f42144b7 - Fixed formatting
- 86cd453a - Removed wrong license statements.
- 4a3bed25 - Fixed build with Gaudi.
- c76c4386 - Improved the way forward triplet search works.
- 7d235874 - Applied formatting.
- e6623981 - Fix warnings.
- 35f79b17 - Updated efficiency reference files.
Toggle commit list-
38da5a63...457e721c - 60 commits from branch
assigned to @ascarabo
Please register or sign in to reply