Running ACTS FW FittingAlgorithms in parallel env brings no performance improvement
Intel TBB library is used to parallelise the main loop in ACTS FW FittingAlgorithm.cpp.
Observations:
- The number of TBB threads can only be configured once (when the TBB scheduler is first initialised). Later or local configurations are ignored (unlike OpenMP or UNIX Pthreads).
- Since the Sequencer already uses as many threads as physical cores, looks like the processor is already busy full-time, so a later parallelisation seems to be useless.
- The parallel threads in Sequencer do not need any synchronisation due to data independence. The threads in Fitter need to synchronise on the output result: the trajectories. Therefore having the parallelisation done only here would not be so beneficial as having it in Sequencer.
See attached PDF with time measurements which seems to support the above observations: TBBFittingParallelExecutionTimes.pdf
Code is committed in ACTSFW/parallel-fitter branch.