RichFutureKernel: Rework histo test algorithm for better stability
Hopefully address some diffs seen in the arm tests w.r.t. x86_64.
Also better test that filling a histogram directly, or via a local buffer object, gives the same results.
Edited by Christopher Rob Jones