Optimize data handling between threads

See the discussion that started here:

AlignAlgorithm performs a lot of data locking to allow running in multi-threading mode. However, the most time-consuming code is now inside the mutex locks. Therefore, some of us expect that multi-threading will not actually be very efficient.

One quick solution is to separate the mutable data again in 'accumulations' (called 'equations) and 'alignables' (called 'ElementsToAlign'), and have the latter (or both) use SynchronizedValue<Data,std::shared_mutex>, adding the necessary const specifiers in the with_lock calls.

A more evasive solution is to make different threads have (temporary) their own (temporary) data. The most simple solution of this would be to copy the mutable data at the start of the event loop, and 'add' the 'accumulators' back to the algorithm accumulator at the end of the event loop. The 'adding' operation is already implemented and relatively cheap.

A more efficient solution may be to use the concept of a Gaudi::Accumulator.

@graven, @sponce