[WIP] VecCore UME::SIMD backend
This is a draft implementation of a UME::SIMD backend in VecCore.
I can successfully compile the quadratic example using it, but the backend test still fails, so please do not merge this yet.
This merge request is intended to receive feedback only for now.
I ran the example with both GCC-5.3.0 and ICC-16.0.1 (results below).
Although the results are correct, performance is not very good with UME::SIMD. We need to understand why.
$ ./examples/quadratic
5279500: a = -2.746, b = -3.296, c = 23.284, roots = 2, x1 = 2.373, x2 = -3.573
5279501: a = -3.086, b = 0.943, c = 18.966, roots = 2, x1 = 2.637, x2 = -2.331
5279502: a = -4.336, b = 4.720, c = 18.768, roots = 2, x1 = 2.695, x2 = -1.606
5279503: a = 2.029, b = -0.900, c = -9.999, roots = 2, x1 = -2.009, x2 = 2.453
5279504: a = 4.279, b = -4.054, c = -7.248, roots = 2, x1 = -0.911, x2 = 1.859
5279505: a = -0.953, b = 4.933, c = -16.047, roots = 0, x1 = 0.000, x2 = 0.000
5279506: a = 3.004, b = -4.545, c = -6.640, roots = 2, x1 = -0.912, x2 = 2.425
5279507: a = -1.478, b = 3.167, c = -12.018, roots = 0, x1 = 0.000, x2 = 0.000
5279508: a = -4.345, b = 0.019, c = 19.198, roots = 2, x1 = 2.104, x2 = -2.100
5279509: a = -0.438, b = 0.699, c = 7.513, roots = 2, x1 = 5.017, x2 = -3.421
elapsed time = 51.000ms (scalar code)
5279500: a = -2.746, b = -3.296, c = 23.284, roots = 2, x1 = 2.373, x2 = -3.573
5279501: a = -3.086, b = 0.943, c = 18.966, roots = 2, x1 = 2.637, x2 = -2.331
5279502: a = -4.336, b = 4.720, c = 18.768, roots = 2, x1 = 2.695, x2 = -1.606
5279503: a = 2.029, b = -0.900, c = -9.999, roots = 2, x1 = -2.009, x2 = 2.453
5279504: a = 4.279, b = -4.054, c = -7.248, roots = 2, x1 = -0.911, x2 = 1.859
5279505: a = -0.953, b = 4.933, c = -16.047, roots = 0, x1 = 0.000, x2 = 0.000
5279506: a = 3.004, b = -4.545, c = -6.640, roots = 2, x1 = -0.912, x2 = 2.425
5279507: a = -1.478, b = 3.167, c = -12.018, roots = 0, x1 = 0.000, x2 = 0.000
5279508: a = -4.345, b = 0.019, c = 19.198, roots = 2, x1 = 2.104, x2 = -2.100
5279509: a = -0.438, b = 0.699, c = 7.513, roots = 2, x1 = 5.017, x2 = -3.421
elapsed time = 376.000ms (vector backend)