clang 11 seems to be too smart and actually optimizes the scalar loop to be faster than the vectorized loop, making the test fail.
This is a temporary workaround.
The subject is tracked in #139. It's probably interesting to figure out what clang does as it might hint us to a possible improvement to the simd wrappers.