Skip to content
Snippets Groups Projects

Separate control-flow / work in FTRawBankDecoder

Merged Gerhard Raven requested to merge alternate-ft-decoding into master
All threads resolved!
  • factor out control flow in two functions, reverse_each_module and for_each_quadrant.
  • reverse_each_module takes two iterators, and finds the partitions corresponding to a module, and the reverses each of these partitions partitions
  • for_each_quadrant takes a container and a container of offsets which defines partitions of that container. It calls, for each parittion, the callable with the first,last iterator of these partitions
  • replace the vector of partitionpoints with a boost::container::static_vector to avoid allocating memory on the heap
  • add some lamdas which bind common function arguments, so they don't have to be repeated...
  • move the verbose printout out of the main decoding loop (avoiding repeated checks on msgLevel) and instead provide a single loop which dumps the created clusters (note: this slightly changes the printout format!)
Edited by Marco Cattaneo

Merge request reports

Checking pipeline status.

Approval is optional

Merged by Marco CattaneoMarco Cattaneo 7 years ago (Apr 13, 2018 7:03am UTC)

Merge details

  • Changes merged into run2-patches with 204298bb.
  • Deleted the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Developer

    @jvantilb , @sesen : FYI -- slightly different style, hope you like it... (this is an attempt of splitting the 'control/data flow' from the work in order to better understand whether the current raw bank format could be vectorized -- conclusion: not really, as the location where the clusters have to be written is only known once the preceding cluster is decoded, introducing a dependency which cannot really be expressed using SIMD types)

    @cattanem : please unWIP/add labels once (if) @jvantilb , @sesen agree.

  • mentioned in commit Rec@7bdb9243

  • Contributor

    Thanks Gerhard. In general it looks good to me. Have you tested it for both bank versions?

    @lohenry: Would this create any complications for you?

  • I don't think so. I did the replacing of the partitionpoint vector, and the rest seems easy to reimplement in my branch.

  • Author Developer

    If I run the 223 tests in LHCb, then I get:

    98% tests passed, 4 tests failed out of 223
    
    The following tests FAILED:
    	 22 - LHCbMath.testsimilarity_AVX2 (Not Run)
    	 26 - LHCbMath.testvdtmath_AVX2 (Not Run)
    	 27 - LHCbMath.testvdtmath_AVX2FMA (Not Run)
    	 88 - XMLSummaryKernel.exit (Failed)
    

    i.e. with the exception of XMLSummaryKernel.exit, all tests pass. So if there is a test which eg. prints all FT clusters, given rawbanks in various formats, and checks that the output is as expected, then the answer is 'yes'. -- otherwise it is unfortunately 'no'

    (or to phrase it differently: could such a test please be added? Better to have an automated test than to rely on someone actively checking something -- eg. I don't know which input data to use for such a test)

  • Thanks @graven for these improvements. All looks ok from my side to merge it into master.

    It is not surprising that you conclude that there is nothing to gain from vectorisation. There are currently two bank versions implemented (v2 and v3). In both versions I expect no gain from vectorisation. Nevertheless, we need a another new bank version in the near future (let's call it v4). @lohenry is working on this. In this new version the local headers are removed, and therefore the clusters can be processed in parallel more easily.

    There is one caveat, occasionally we may want to merge two adjacent clusters, or we may want to insert some clusters. This is implemented in v3, and it does not happen so often. However, the question is how much CPU time these extra operations will costs (to be balanced with a slightly reduced tracking performance). And another question is whether these operations can be done on the Tell40 FPGA already. Both are open questions, and therefore I waw trying to find out how much we can gain from vectorisation of the v4 bank (always looking forward ;-)).

    In addition, the FastClusterContainer should be replaced by a container that holds the partition points directlty. For instance, by an array of vectors (all other suggestions are welcome). These partition points are needed again in the conversion of FTLiteCluster to PrFTHit.

    Concerning the tests, this would definitely be good to have. Maybe @lohenry can implement one with the data for his new bank version?

  • Edited by Software for LHCb
  • Gerhard Raven resolved all discussions

    resolved all discussions

  • Gerhard Raven added 15 commits

    added 15 commits

    Compare with previous version

  • Author Developer

    @jvantilb : if you need a container with some extra indexing, you probably want to do something along the lines of IndexedHitContainer> -- as you really want a single, contiguous storage (*), with a separate 'index' (ie. basically what you now have with the partition points, but then hidden), and avoid the extra indirection you get from eg. an array of vectors...

    (*) in the ideal case, we already know how many items that container should contain, so that this single block can be allocated 'just right' before filling it. If possible, any upstream processing (eg. event builder, TELL40, frontend, ... ) would, when it concatenates/amalgametes data from a number of sources, always keep a count of the sum of the number of data items from each of the direct input sources -- and then this principle should be applied recursively, i.e. at each point where data is amalgamated...

  • @graven, thanks for coming back on this. I was not sure which option is better: IndexedHitContainer or array<vector<int>>. For the latter, I do not know how to reserve a continuous block in memory. However, we first need to loop anyway on all banks in the FT and sum all the sizes before we know the total size. Therefore, all information is already present to reserve the memory for an array of vectors. Wouldn't this be better, if it is possible at all? Just wondering...

  • Marco Cattaneo unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Marco Cattaneo mentioned in commit 204298bb

    mentioned in commit 204298bb

  • Please register or sign in to reply
    Loading