Draft: Extend float_v to work with arbitrary precision
Related to !4988 and #386 I'm trying to extend the float_v
class to be templated to any floating-point type. This will also require an extension of int_v to allow longs and possibly also mask_v.
Todo:
-
Extend float_v
to allow doubles -
Extend int_v
to allow longs -
Extend mask_v
to allow longs -
Extend functions elsewhere in LHCb
which requiredfloat_v
,int_v
andmask_v
arguments/return values to instead use the templated classes
Merge request reports
Activity
I would also like to know what @ahennequ thinks -- basically, one has to be very careful about the implications of this...
In the end, one must balance accuracy and performance, and where possible, first try to identify and avoid 'bad algorithms' which require an increased 'internal' accuracy to remain numerically stable. Often (but not always) those can be re-formulated in mathematically equivalent ones which have better behavior - and it would be a pity if one had to go to an 'excessive' precision every where when it is only needed in a few places.
Hence I do not think there is an a-priori clear answer, and thus I will refrain from a clearcut opinion, and just acknowledge that this is tricky...
One of the design principle of the library is that we have the same number of elements in every vectors of a type in a given backend. This helps write code that feel like scalar.
I see two different ways to tackle this issue:
- only allow doubles for the scalar backend
- use virtual vectors for doubles (eg for avx2 one double_v would contain 2 4-elements registers to match the size required by the backend (8 elements))
it would be a pity if one had to go to an 'excessive' precision every where when it is only needed in a few places
So that was my logic for making this templated, is that we can specify double-precision only when necessary and use floats otherwise. For the use case specifically highlighted in !4988, it turns out that there may be a work-around possible using the values that are already
double
s (i.e. |p| and then constraining the PDG mass, roughly as @graven suggested)
For the scalar issue, that's I think why we'd need to also extend
int_v
andmask_v
to allow size-8 elements (i.e.long
s), or is this not what you mean @ahennequ?no this not what I meant. You want to add a new type double_v to each backend. But you don't want to modify the existing types since that would result in loosing paralelism in the default case.
For example, avx2 as 3 types:
- float_v: 8 x float
- int_v: 8 x int
- mask_v: 8 x bool
To avoid breaking the logic, the double_v type you want to add should be: 8 x double
The way to do that is to make it as:
class double_v { private: __m256 data{}; __m256 data2{}; };
That way every vector in a given backend have the same number of elements and you don't break the other types, the loop_masks, etc...
added RTA label
requested review from @ahennequ
260 260 261 261 class int_v; 262 262 263 class float_v; 263 template <typename T, typename = std::enable_if_t<std::is_floating_point_v<T>>> Sorry for not seeing this at first. I've added it to the agenda