Introduce timing information in logParser [skip modfiles]
CountHepMC == nEventsPerJob
The check that
CountHepMC == nEventsPerJob has not been removed, because I think it might still be a good handle to catch unwanted bugs. Instead I added a flag in the commit script that can bypass this check. Here is what happens when trying to add
421305, run with
mcjoboptions_3 sargyrop$ ./scripts/commit_new_dsid.sh -d=421305 -n INFO: will use following remote for pushing: origin New DSID directory: 421xxx/421305 ... OK: log.generate file found. ... - CountHepMC Events passing all checks and written = 10 <-- ERROR: This is not equal to nEventsPerJob=1000 ... --------------------- Summary: --------------------- Errors : 1 , Warnings : 5 -> Errors encountered! Not ready for production! ERROR: log.generate contains errors Fix them before committing anything!
In this case users should make sure that disabling the
CountHepMC check is what they really want and run
./scripts/commit_new_dsid.sh -d=421305 -t -n INFO: will use following remote for pushing: origin New DSID directory: 421xxx/421305 ... OK: log.generate file found. OK: log.generate file contains no errors OK: CI job expected to last less than 1h - time estimate: 0.17 hours Will now add files to git commit File: log.generate cannot be added to the commit. Skipping. Skipping: log.generate.short since the logParser CI job will not be run Added: mc.Sh_228_Wmunu_EnhLogPtV.py Added: mc_13TeV.Sh_228_Wmunu_EnhLogPtV.GRID.tar.gz
How timing works in logParser
There are 3 timing informations provided:
CountHepMC == nEventsPerJob: it just prints the timing information from log.generate (same as what we had before)
CountHepMC != nEventsPerJob: it will extrapolate the timing information using
timing for CI job: this is something extra to help debugging and have more meaningful printouts, it will just extrapolate the timing information to
max(10,0.01*nEventsPerJob), i.e. the number of events we will run in the CI
The actual (case 1) or extrapolated CPU time (case 2 above) lead to the following behaviour:
- CPU < 1 h OR CPU > 18 h => ERROR
- CPU between 8-12 h => GOOD
CPU=1-8 OR 12-18 h => WARNING
i.e. currently the limits are quite loose and we would only cut jobs which last less than 1h or more than 18h.
For the CI (case 3) above we cut everything that is expected to last more than 1h.
Here is an example from
logParser run on the above 421305 DSID with 10 events (nEventsPerJob=1000 in this case):
--------------------- Performance metrics: --------------------- - actual CPU (10 events) = 0.17 hrs - CPU extrapolated to 1000 events = 16.6 hrs <-- WARNING: CPU time not optimal - should be between 8-12h. Adjust nEventsPerJob! - estimated CPU for CI job = 0.17 hrs - Virtual memory = 2957.352 Mb
Notice that the first line is the actual time it took to run the job (with
--maxEvents=10), the 2nd is the extrapolation to
nEventsPerJob=1000 and the 3rd is the extrapolation for the
max(10,0.01*1000)=10 events that would run in the CI.
logParser in CI is always run with
-t so there should be no extra change needed in order to re-activate the athena CI jobs.