CPU Offload Algorithms (rebased)
Equivalent of !96 (closed) rebased on master
This MR adds an option to the sequence, --cpu-offload
, that offloads part of the computation to the CPU. It is on by default. So far, this serves as a switch to run the following algorithms on the CPU:
- Global event cut
- Prefix sum
- Copy and prefix sum
The GEC, when run on the CPU, produces results that are consistent between runs (ie. events are always selected in the same order). The prefix sum offload to the CPU results in a 5-10% speedup of the entire sequence.