Introduce timing information in logParser [skip modfiles]
This MR introduces timing information according to what was discussed in #45 (closed) #51 (closed) and will allow to reactivate the athena CI job.
CountHepMC == nEventsPerJob
Handling of The check that CountHepMC == nEventsPerJob
has not been removed, because I think it might still be a good handle to catch unwanted bugs. Instead I added a flag in the commit script that can bypass this check. Here is what happens when trying to add 421305
, run with --maxEvents=10
:
mcjoboptions_3 sargyrop$ ./scripts/commit_new_dsid.sh -d=421305 -n
INFO: will use following remote for pushing: origin
New DSID directory: 421xxx/421305 ...
OK: log.generate file found.
...
- CountHepMC Events passing all checks and written = 10 <-- ERROR: This is not equal to nEventsPerJob=1000
...
---------------------
Summary:
---------------------
Errors : 1 , Warnings : 5 -> Errors encountered! Not ready for production!
ERROR: log.generate contains errors
Fix them before committing anything!
In this case users should make sure that disabling the CountHepMC
check is what they really want and run commit_new_dsid.sh
with -t
:
./scripts/commit_new_dsid.sh -d=421305 -t -n
INFO: will use following remote for pushing: origin
New DSID directory: 421xxx/421305 ...
OK: log.generate file found.
OK: log.generate file contains no errors
OK: CI job expected to last less than 1h - time estimate: 0.17 hours
Will now add files to git commit
File: log.generate cannot be added to the commit. Skipping.
Skipping: log.generate.short since the logParser CI job will not be run
Added: mc.Sh_228_Wmunu_EnhLogPtV.py
Added: mc_13TeV.Sh_228_Wmunu_EnhLogPtV.GRID.tar.gz
How timing works in logParser
There are 3 timing informations provided:
-
if
CountHepMC == nEventsPerJob
: it just prints the timing information from log.generate (same as what we had before) -
if
CountHepMC != nEventsPerJob
: it will extrapolate the timing information usingextrapCPU=float(generateDict['nEventsPerJob'])*cpuPerJob/float(CountHepMC)
-
timing for CI job: this is something extra to help debugging and have more meaningful printouts, it will just extrapolate the timing information to
max(10,0.01*nEventsPerJob)
, i.e. the number of events we will run in the CI
Timing limits
The actual (case 1) or extrapolated CPU time (case 2 above) lead to the following behaviour:
- CPU < 1 h OR CPU > 18 h => ERROR
- CPU between 8-12 h => GOOD
-
CPU=1-8 OR 12-18 h => WARNING
i.e. currently the limits are quite loose and we would only cut jobs which last less than 1h or more than 18h.
For the CI (case 3) above we cut everything that is expected to last more than 1h.
Here is an example from logParser
run on the above 421305 DSID with 10 events (nEventsPerJob=1000 in this case):
---------------------
Performance metrics:
---------------------
- actual CPU (10 events) = 0.17 hrs
- CPU extrapolated to 1000 events = 16.6 hrs <-- WARNING: CPU time not optimal - should be between 8-12h. Adjust nEventsPerJob!
- estimated CPU for CI job = 0.17 hrs
- Virtual memory = 2957.352 Mb
Notice that the first line is the actual time it took to run the job (with --maxEvents=10
), the 2nd is the extrapolation to nEventsPerJob=1000
and the 3rd is the extrapolation for the max(10,0.01*1000)=10
events that would run in the CI.
Note that logParser
in CI is always run with -t
so there should be no extra change needed in order to re-activate the athena CI jobs.
Closes #51 (closed) I think this also closes #15 (closed)