Separate control-flow / work in FTRawBankDecoder
- factor out control flow in two functions,
reverse_each_module
andfor_each_quadrant
. -
reverse_each_module
takes two iterators, and finds the partitions corresponding to a module, and the reverses each of these partitions partitions -
for_each_quadrant
takes a container and a container of offsets which defines partitions of that container. It calls, for each parittion, the callable with the first,last iterator of these partitions - replace the vector of partitionpoints with a boost::container::static_vector to avoid allocating memory on the heap
- add some lamdas which bind common function arguments, so they don't have to be repeated...
- move the verbose printout out of the main decoding loop (avoiding repeated checks on msgLevel) and instead provide a single loop which dumps the created clusters (note: this slightly changes the printout format!)
Merge request reports
Activity
@jvantilb , @sesen : FYI -- slightly different style, hope you like it... (this is an attempt of splitting the 'control/data flow' from the work in order to better understand whether the current raw bank format could be vectorized -- conclusion: not really, as the location where the clusters have to be written is only known once the preceding cluster is decoded, introducing a dependency which cannot really be expressed using SIMD types)
@cattanem : please unWIP/add labels once (if) @jvantilb , @sesen agree.
mentioned in commit Rec@7bdb9243
If I run the 223 tests in LHCb, then I get:
98% tests passed, 4 tests failed out of 223 The following tests FAILED: 22 - LHCbMath.testsimilarity_AVX2 (Not Run) 26 - LHCbMath.testvdtmath_AVX2 (Not Run) 27 - LHCbMath.testvdtmath_AVX2FMA (Not Run) 88 - XMLSummaryKernel.exit (Failed)
i.e. with the exception of XMLSummaryKernel.exit, all tests pass. So if there is a test which eg. prints all FT clusters, given rawbanks in various formats, and checks that the output is as expected, then the answer is 'yes'. -- otherwise it is unfortunately 'no'
(or to phrase it differently: could such a test please be added? Better to have an automated test than to rely on someone actively checking something -- eg. I don't know which input data to use for such a test)
Thanks @graven for these improvements. All looks ok from my side to merge it into master.
It is not surprising that you conclude that there is nothing to gain from vectorisation. There are currently two bank versions implemented (v2 and v3). In both versions I expect no gain from vectorisation. Nevertheless, we need a another new bank version in the near future (let's call it v4). @lohenry is working on this. In this new version the local headers are removed, and therefore the clusters can be processed in parallel more easily.
There is one caveat, occasionally we may want to merge two adjacent clusters, or we may want to insert some clusters. This is implemented in v3, and it does not happen so often. However, the question is how much CPU time these extra operations will costs (to be balanced with a slightly reduced tracking performance). And another question is whether these operations can be done on the Tell40 FPGA already. Both are open questions, and therefore I waw trying to find out how much we can gain from vectorisation of the v4 bank (always looking forward ;-)).
In addition, the FastClusterContainer should be replaced by a container that holds the partition points directlty. For instance, by an array of vectors (all other suggestions are welcome). These partition points are needed again in the conversion of FTLiteCluster to PrFTHit.
Concerning the tests, this would definitely be good to have. Maybe @lohenry can implement one with the data for his new bank version?
- [2018-04-11 00:50] Validation started with lhcb-gaudi-head#1829
- [2018-04-11 09:38] Validation started with lhcb-gaudi-head#1829
- [2018-04-12 00:15] Validation started with lhcb-gaudi-head#1830
- [2018-04-12 10:39] Validation started with lhcb-gaudi-head#1831
- [2018-04-12 17:24] Validation started with lhcb-gaudi-merge#481
- [2018-04-13 00:11] Validation started with lhcb-tdr-test#118
- [2018-04-13 00:11] Validation started with lhcb-clang-test#919
- [2018-04-13 00:12] Validation started with lhcb-head#1817
- [2018-04-13 00:13] Validation started with lhcb-gaudi-head-py3#122
- [2018-04-13 00:14] Validation started with lhcb-gaudi-head#1832
Edited by Software for LHCb- Resolved by Gerhard Raven
added 15 commits
-
145de99e...a159f674 - 14 commits from branch
master
- 28e0a868 - Separate control-flow / work in FTRawBankDecoder
-
145de99e...a159f674 - 14 commits from branch
@jvantilb : if you need a container with some extra indexing, you probably want to do something along the lines of IndexedHitContainer> -- as you really want a single, contiguous storage (*), with a separate 'index' (ie. basically what you now have with the partition points, but then hidden), and avoid the extra indirection you get from eg. an array of vectors...
(*) in the ideal case, we already know how many items that container should contain, so that this single block can be allocated 'just right' before filling it. If possible, any upstream processing (eg. event builder, TELL40, frontend, ... ) would, when it concatenates/amalgametes data from a number of sources, always keep a count of the sum of the number of data items from each of the direct input sources -- and then this principle should be applied recursively, i.e. at each point where data is amalgamated...
@graven, thanks for coming back on this. I was not sure which option is better:
IndexedHitContainer
orarray<vector<int>>
. For the latter, I do not know how to reserve a continuous block in memory. However, we first need to loop anyway on all banks in the FT and sum all the sizes before we know the total size. Therefore, all information is already present to reserve the memory for an array of vectors. Wouldn't this be better, if it is possible at all? Just wondering...mentioned in commit 204298bb