Separate control-flow / work in FTRawBankDecoder

@jvantilb , @sesen : FYI -- slightly different style, hope you like it... (this is an attempt of splitting the 'control/data flow' from the work in order to better understand whether the current raw bank format could be vectorized -- conclusion: not really, as the location where the clusters have to be written is only known once the preceding cluster is decoded, introducing a dependency which cannot really be expressed using SIMD types)

@cattanem : please unWIP/add labels once (if) @jvantilb , @sesen agree.

mentioned in commit Rec@7bdb9243

Thanks Gerhard. In general it looks good to me. Have you tested it for both bank versions?

@lohenry: Would this create any complications for you?

I don't think so. I did the replacing of the partitionpoint vector, and the rest seems easy to reimplement in my branch.

If I run the 223 tests in LHCb, then I get:

98% tests passed, 4 tests failed out of 223

The following tests FAILED:
	 22 - LHCbMath.testsimilarity_AVX2 (Not Run)
	 26 - LHCbMath.testvdtmath_AVX2 (Not Run)
	 27 - LHCbMath.testvdtmath_AVX2FMA (Not Run)
	 88 - XMLSummaryKernel.exit (Failed)

i.e. with the exception of XMLSummaryKernel.exit, all tests pass. So if there is a test which eg. prints all FT clusters, given rawbanks in various formats, and checks that the output is as expected, then the answer is 'yes'. -- otherwise it is unfortunately 'no'

(or to phrase it differently: could such a test please be added? Better to have an automated test than to rely on someone actively checking something -- eg. I don't know which input data to use for such a test)

Thanks @graven for these improvements. All looks ok from my side to merge it into master.

It is not surprising that you conclude that there is nothing to gain from vectorisation. There are currently two bank versions implemented (v2 and v3). In both versions I expect no gain from vectorisation. Nevertheless, we need a another new bank version in the near future (let's call it v4). @lohenry is working on this. In this new version the local headers are removed, and therefore the clusters can be processed in parallel more easily.

There is one caveat, occasionally we may want to merge two adjacent clusters, or we may want to insert some clusters. This is implemented in v3, and it does not happen so often. However, the question is how much CPU time these extra operations will costs (to be balanced with a slightly reduced tracking performance). And another question is whether these operations can be done on the Tell40 FPGA already. Both are open questions, and therefore I waw trying to find out how much we can gain from vectorisation of the v4 bank (always looking forward ;-)).

In addition, the FastClusterContainer should be replaced by a container that holds the partition points directlty. For instance, by an array of vectors (all other suggestions are welcome). These partition points are needed again in the conversion of FTLiteCluster to PrFTHit.

Concerning the tests, this would definitely be good to have. Maybe @lohenry can implement one with the data for his new bank version?

[2018-04-11 00:50] Validation started with lhcb-gaudi-head#1829
[2018-04-11 09:38] Validation started with lhcb-gaudi-head#1829
[2018-04-12 00:15] Validation started with lhcb-gaudi-head#1830
[2018-04-12 10:39] Validation started with lhcb-gaudi-head#1831
[2018-04-12 17:24] Validation started with lhcb-gaudi-merge#481
[2018-04-13 00:11] Validation started with lhcb-tdr-test#118
[2018-04-13 00:11] Validation started with lhcb-clang-test#919
[2018-04-13 00:12] Validation started with lhcb-head#1817
[2018-04-13 00:13] Validation started with lhcb-gaudi-head-py3#122
[2018-04-13 00:14] Validation started with lhcb-gaudi-head#1832

resolved all discussions

added 15 commits

145de99e...a159f674 - 14 commits from branch master
28e0a868 - Separate control-flow / work in FTRawBankDecoder

Compare with previous version

@jvantilb : if you need a container with some extra indexing, you probably want to do something along the lines of IndexedHitContainer> -- as you really want a single, contiguous storage (*), with a separate 'index' (ie. basically what you now have with the partition points, but then hidden), and avoid the extra indirection you get from eg. an array of vectors...

(*) in the ideal case, we already know how many items that container should contain, so that this single block can be allocated 'just right' before filling it. If possible, any upstream processing (eg. event builder, TELL40, frontend, ... ) would, when it concatenates/amalgametes data from a number of sources, always keep a count of the sum of the number of data items from each of the direct input sources -- and then this principle should be applied recursively, i.e. at each point where data is amalgamated...

@graven, thanks for coming back on this. I was not sure which option is better: IndexedHitContainer or array<vector<int>>. For the latter, I do not know how to reserve a continuous block in memory. However, we first need to loop anyway on all banks in the FT and sum all the sizes before we know the total size. Therefore, all information is already present to reserve the memory for an array of vectors. Wouldn't this be better, if it is possible at all? Just wondering...

unmarked as a Work In Progress

merged

mentioned in commit 204298bb

Separate control-flow / work in FTRawBankDecoder

Merged by Marco Cattaneo 7 years ago (Apr 13, 2018 7:03am UTC) 7 years ago

Activity

Separate control-flow / work in FTRawBankDecoder

Merge request reports

Merged by Marco Cattaneo 7 years ago (Apr 13, 2018 7:03am UTC) 7 years ago

Activity