Use only SSE2 on prefix sum
This MR makes the manually vectorised implementation of the prefix sum use only SSE2 (max) ops, which are available in any x86_64
architecture.
Potentially solves MooreAnalysis#30 (closed)
Edited by Daniel Hugo Campora Perez