Add option to use modularL / MultiProcess-based fits
Using MultiProcess one can speed up fits by parallelizing the fit over multiple CPUs. The parallelization works mainly by splitting up the partial derivatives of the gradient used in Minuit. It can also split likelihoods over either components (in case of simultaneous PDFs) or events to speed up the line search phase of Minuit.
The useModularL option activates the basic framework. Then options are available to activate particular parts: the gradient parallelization (on by default) and parallelization of the "descent" / line search phase of the fits (off by default). In addition, the NumCPU option is hijacked to specify the number of workers for the parallelization backend. This hijack is unproblematic, because the legacy NumCPU option does not work together with the new modularL-based setup.
Also adds a short line of documentation on speeding up fits in general.
These options are available from ROOT version 6.27.02, so the command line options are enclosed in ROOT version ifdefs.