Skip to content

PerfMonMT: Update Event Loop Monitoring and add various improvements

Hasan Ozturk requested to merge haozturk/athena:PerfMonMTSvc into master

Hi,

This MR includes changes implemented according to the feedback taken in one of the Athena Core Software Meetings at August 2019. Development is done on top of this MR . Here are the list of changes:

  • Event Loop Monitoring: The previous version relied on the assumption that each event is run on a single thread in its entire lifetime. In this version we have eliminated this assumption. Now the service captures CPU and Memory measurements at certain checkpoints based on event numbers. These measurements are read from process, therefore it's thread-safe. This gives a overall picture of the event loop.
  • Updated plots to make them look more readable.
  • Added peak values for vmem, rss, pss in the summary result.

Here is a portion of the output for event loop monitoring. The job is run with 1000 events on 5 threads. Measurements are captured in every 5 events. Beginning of the event loop is set as offset:

INFO =======================================================================================
INFO                              CPU & Wall Time Monitoring                                
INFO                                     (Event Loop)                                       
INFO =======================================================================================
INFO Event CheckPoint             CPU Time [ms]       Wall Time [ms]
INFO 0                            0.00                0 
INFO 5                            11970.00            11997 
INFO 10                           37460.00            17103 
INFO 15                           43980.00            19397 
INFO 20                           52640.00            21124 
INFO 25                           63700.00            23341 
INFO ...                          ...
INFO 975                          2835070.00          583306 
INFO 980                          2842570.00          584807 
INFO 985                          2855510.00          587405 
INFO 990                          2864330.00          589169 
INFO 995                          2880150.00          592350 
INFO =======================================================================================
INFO =======================================================================================
INFO                                   Memory Monitoring                                    
INFO                                     (Event Loop) 
INFO                                       Unit: KB                                      
INFO =======================================================================================
INFO Event CheckPoint           Vmem      Rss       Pss       Swap      
INFO 0                          0         0         0         0
INFO 5                          1075748   1100404   1100404   0
INFO 10                         1075748   1103136   1103136   0
INFO 15                         1075748   1127732   1127732   0
INFO 20                         1075748   1127928   1127928   0
INFO 25                         1075748   1127956   1127956   0
INFO ...                        ...
INFO 975                        1318096   1359096   1359096   0
INFO 980                        1318096   1358584   1358584   0
INFO 985                        1318096   1358588   1358588   0
INFO 990                        1318096   1358600   1358600   0
INFO 995                        1318096   1358612   1358612   0
INFO =======================================================================================

cc: @amete

Best, Hasan

Edited by Hasan Ozturk

Merge request reports