UT sorted decoding
Closes https://gitlab.cern.ch/lhcb-parallelization/cuda_hlt/issues/42
A faster UT decoder has been implemented.
- It is composed of:
ut_calculate_number_of_hits,prefix_sum_reduce_ut_hits,prefix_sum_single_block_ut_hits,prefix_sum_scan_ut_hits,ut_pre_decode,ut_find_permutationandut_decode_raw_banks_in_order. - Anything prior to
ut_pre_decodewas already existing. -
ut_pre_decodeis a lightweight decoding that only decodes theyBeginparameter, and theraw_bankandhit_indexof the current hit in a combinedint. -
ut_find_permutationfinds a permutation based on the newly createdyBeginarray. -
ut_decode_raw_banks_in_orderdecodes all raw banks according to the permutation already established in the previous step. - Moved
UTDefinitionstoUT/common. - Created
UTDecodingnamespace for severalstatic constexprparameters. - Performance of the entire UT decoding + sorting sequence has been improved by about
33%(it now runs at about334.7 kHz).
Edited by Daniel Hugo Campora Perez