Do not use atomicAdd for floating point numbers
FP operations are not commutative. In other words, if repeatable results are sought, FP operations should always be done in the same order. Therefore, using an atomicAdd for floating point numbers is by definition a source of non-repeatable results. In addition, certain hardware (like AMD GPUs) doesn't have hardware-optimized units for these.
The following is I believe the only use of atomicAdd for FP numbers in Allen. It would be preferable to express this logic without using it.
https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/PV/beamlinePV/src/pv_beamline_histo.cu#L96