Concerning Vectorclass : The usage of vectorization was only beneficial because the underlying math were not simplified. Using simplified equations, not only vectorization is not worth it but one gains another 30% in throughput on the vectorized code.
Now it seems that this header, although still in the repo and modified few days ago is not in use anymore... So I've just dropped it.
For the record, the base of the simplification is to change :
w1 = sqrt(u1 + i*u2) w2 = sqrt(u1 - i*u2) v = w1*w2
v = sqrt(u1^2 + u2^2)
partres = real(w1) + real(w2)
partres = sqrt(2*(v+u1))