Implementing bulk copying mechanisms for pixel clusters
Description
This MR implements bulk copying mechanisms for the creation of the pixel cluster containers. This approach operates on a variable-by-variable basis, as opposed to an element-by-element mechanism with per cluster calls. Since each variable is backed by a contiguous buffer, all elements of a single variable are copied in one memory operation, which reduces computational overhead when the number of clusters becomes large.
Key Changes
- The chosen mechanism to perform the copy is configurable via the
doBulkCopy
flag to allow for further testing and benchmarking against the previous per-cluster approach. - Pre-allocation of memory for the entire container and auxiliary store
- Use of
SG::Accessor
andCxxUtils::copy_bounded
for more efficient bulk transfer of the transient data - Added more explicit performance measurement for both methods
Testing
Running over 50 events using the following Test Vector file:
/eos/project/a/atlas-eftracking/TestVectors/FPGATrackSim_TVs/Test_Vectors_v0-6-2/ClusterEDM_PU200/pixelL2G_EDMoutput.txt
Element-wise:
ElementWiseMethod INFO Time User : Tot= 7.05 [s] Ave/Min/Max= 0.141(+- 0.0228)/ 0.11/ 0.24 [s] #= 50
Bulk-copy:
BulkCopyMethod INFO Time User : Tot= 5.18 [s] Ave/Min/Max= 0.104(+- 0.0121)/ 0.09/ 0.14 [s] #= 50
The use of the optimised copying method improves the time of the data transfer by approximately 30% on average. This test was not performed on a dedicated machine, but gives an indication of the improvement. More robust benchmarking will need to be performed.
Caveats and Next Steps
Currently, the bulk-copy can only be implemented for fixed-length variables. Nested, or jagged, vectors will require the use of a dedicated accessor and extensions to the auxiliary container class.
The next steps will be to implement this method analogously for the strip clusters. We could also use DataPool
in the pre-allocation to avoid individual allocations too, I'll look into this next in order to keep the updates incremental.
Some follow-ups after discussion
- Benchmarking with more events on the testbed.
- break down timing for better picture of aspects of the method.
These will be performed after merging.
Merge request reports
Activity
assigned to @leevans
- Resolved by Lucy Lewitt
- Resolved by Lucy Lewitt
- Resolved by Lucy Lewitt
- Resolved by Levi Evans
- Resolved by Levi Evans
- Resolved by Levi Evans
A couple of super small comments but this looks really nice.
One question regarding the
CxxUtils::copy_bounded
. I've not used this but I'm curious how it compares to move conventional copy methods. Has this been benchmarked in this use case?Also for the benchmarks probably we want to iterate a few more times, it should be possible to tell athena to run the same events multiple times, or just using more events would work too.
added 1 commit
- cc6d0f94 - feat: Replace at() bounds checking in both methods
This merge request affects 1 package:
- Trigger/EFTracking/EFTrackingFPGAIntegration
This merge request affects 4 files:
- Trigger/EFTracking/EFTrackingFPGAIntegration/python/FPGADataFormatter.py
- Trigger/EFTracking/EFTrackingFPGAIntegration/python/IntegrationConfigFlag.py
- Trigger/EFTracking/EFTrackingFPGAIntegration/src/xAODClusterMaker.cxx
- Trigger/EFTracking/EFTrackingFPGAIntegration/src/xAODClusterMaker.h
added EFTracking Trigger main review-pending-level-1 labels
CI Result SUCCESS (hash cc6d0f94)Athena externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
For experts only: Jenkins output (remote access info)