Optimise fragment offset calculation for GPU batches
Also add an application that benchmarks fragment offset calculation
The speed of the mep_offsets
function improves from 14.3 Hz to 648 Hz (45x speedup). 4 threads are now more than enough to calculate the offsets; 2 would probably still be fine. It seems that EB::get_padding( s, 1 << align )
is very slow.
requires Allen!1127 (merged)
Edited by Roel Aaij