Skip to content
Snippets Groups Projects

Multithreading! (+ cleanup)

Merged Kevin Pedro requested to merge github/fork/kpedro88/upgrade2017 into Run2
  1. cout is not threadsafe, so replaced with MessageLogger everywhere (except rare cases hidden behind preprocessor flags). "TreeMaker" category used everywhere, with LogInfo or LogWarning as appropriate. This will be enforced for future development.
  2. Legacy modules (EDProducer, EDFilter, EDAnalyzer) no longer used. All existing modules moved to thread-aware classes (see FWMultithreadedFrameworkModuleTypes). Most were able to be promoted to global. The few exceptions: stream: JetProperties, SusyScanProducer (unavoidable modifications of class members in produce function) one: NeffFinder, TreeMaker (aggregate over all events - TreeMaker now an analyzer, since it doesn't produce anything) This will also be enforced for future development.
  3. Various cleanup, removing blank functions/pointless commands/unneeded includes/code duplication/etc.

A quick test with 1000 events from the Summer16 SMS-T1tttt_mGluino-1500_mLSP-100_TuneCUETP8M1_13TeV-madgraphMLM-pythia8 sample gave the following results:

threads mem (peak) [MB] time [min]
1 870 4.53
4 980 1.73

That's a 2.6x speedup with only 13% increase in memory usage, which is not too shabby.

todo: propagate to Condor scripts (requires some more thought/testing)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Note: most recent commit a908e4f8 resolves an issue of reproducibility for MC samples w/ MT. Random numbers were used in the SmearedPATJetProducer, leading to differences in the output jet pT, etc. Now a deterministic seed is used (based on run number + event number) and results are consistent with single-threaded running.

    Also, to add another data point to the table:

    threads mem (peak) [MB] time [min]
    2 960 2.70
  • Latest commits:

    • suppress thread-unsafe couts from Heppy tools by setting buffer to null
    • add option to activate TimeMemoryInfo for simple profiling
    • add multithreading option for Condor

    Running with 2938 events, using the CMSSW Timing Service and SimpleMemoryCheck to measure quantities of interest:

    threads time [s] speedup RSS [MB] mem increase
    1 752.1 1.00 951.5 1.00
    2 412.0 1.83 977.0 1.03
    3 298.3 2.52 994.3 1.04
    4 244.8 3.07 1007.7 1.06
    5 213.9 3.52 1018.1 1.07
    6 188.6 3.99 1033.3 1.09
    7 170.6 4.41 1049.6 1.10
    8 150.7 4.99 1052.8 1.11
    9 142.4 5.28 1073.1 1.13
    10 138.4 5.43 1080.3 1.14
    11 129.2 5.82 1080.1 1.14
    12 130.1 5.78 1093.9 1.15

    Fitting the speedup to Amdahl's law (Sp = 1/(S + (1-S)/N)), I find S = 9.5% (serial component of CPU usage), implying Sp(N→∞) = 10.5 as the ultimate possible speedup.

Please register or sign in to reply
Loading