Multithreading! (+ cleanup)
-
cout
is not threadsafe, so replaced with MessageLogger everywhere (except rare cases hidden behind preprocessor flags). "TreeMaker" category used everywhere, withLogInfo
orLogWarning
as appropriate. This will be enforced for future development. - Legacy modules (
EDProducer
,EDFilter
,EDAnalyzer
) no longer used. All existing modules moved to thread-aware classes (see FWMultithreadedFrameworkModuleTypes). Most were able to be promoted toglobal
. The few exceptions:stream
:JetProperties
,SusyScanProducer
(unavoidable modifications of class members inproduce
function)one
:NeffFinder
,TreeMaker
(aggregate over all events -TreeMaker
now an analyzer, since it doesn't produce anything) This will also be enforced for future development. - Various cleanup, removing blank functions/pointless commands/unneeded includes/code duplication/etc.
A quick test with 1000 events from the Summer16 SMS-T1tttt_mGluino-1500_mLSP-100_TuneCUETP8M1_13TeV-madgraphMLM-pythia8
sample gave the following results:
threads | mem (peak) [MB] | time [min] |
---|---|---|
1 | 870 | 4.53 |
4 | 980 | 1.73 |
That's a 2.6x speedup with only 13% increase in memory usage, which is not too shabby.
todo: propagate to Condor scripts (requires some more thought/testing)
Merge request reports
Activity
Note: most recent commit a908e4f8 resolves an issue of reproducibility for MC samples w/ MT. Random numbers were used in the SmearedPATJetProducer, leading to differences in the output jet pT, etc. Now a deterministic seed is used (based on run number + event number) and results are consistent with single-threaded running.
Also, to add another data point to the table:
threads mem (peak) [MB] time [min] 2 960 2.70 Latest commits:
- suppress thread-unsafe couts from Heppy tools by setting buffer to null
- add option to activate TimeMemoryInfo for simple profiling
- add multithreading option for Condor
Running with 2938 events, using the CMSSW Timing Service and SimpleMemoryCheck to measure quantities of interest:
threads time [s] speedup RSS [MB] mem increase 1 752.1 1.00 951.5 1.00 2 412.0 1.83 977.0 1.03 3 298.3 2.52 994.3 1.04 4 244.8 3.07 1007.7 1.06 5 213.9 3.52 1018.1 1.07 6 188.6 3.99 1033.3 1.09 7 170.6 4.41 1049.6 1.10 8 150.7 4.99 1052.8 1.11 9 142.4 5.28 1073.1 1.13 10 138.4 5.43 1080.3 1.14 11 129.2 5.82 1080.1 1.14 12 130.1 5.78 1093.9 1.15 Fitting the speedup to Amdahl's law (
Sp = 1/(S + (1-S)/N)
), I findS = 9.5%
(serial component of CPU usage), implyingSp(N→∞) = 10.5
as the ultimate possible speedup.