AVX2/AVX512 wrapper fixes + new AVX256 backend
- Few fixes to SIMD wrapper.
- Introduction of AVX256 backend: uses avx512 new instructions but with a vector size of 256. This allow to benefit from the increased register count, the new mask registers and new instructions (compressstore) but without the problem of frequency scaling.
- Added masked gathers
Edited by Marco Cattaneo