lhcb-benchmark-scripts issueshttps://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/-/issues2020-03-18T21:06:16+01:00https://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/-/issues/8Move to VTune 2020 and associated Problems2020-03-18T21:06:16+01:00Christoph HasseMove to VTune 2020 and associated ProblemsThanks to @maszyman the nightlies will now run a fixed version(2019) of VTune.
The automatic change to the 2020 version broke the tests because the attaching to a job causes the vtune process to segfault and leave the Moore job just pau...Thanks to @maszyman the nightlies will now run a fixed version(2019) of VTune.
The automatic change to the 2020 version broke the tests because the attaching to a job causes the vtune process to segfault and leave the Moore job just paused.
I manually reproduced this in a dual terminal setup.
Terminal 1 was running the job and upon attaching prints this to `stdout`
```
AMPLXE_TPSSCOLLECTOR: pytrace_tracewriter:341: (jitWriterReady) :
Assertion failed: pytrace_tracewriter:341: (jitWriterReady) : . Please contact the technical support.
*** Break *** segmentation violation
```
While terminal 2 from where I attached vtune prints this:
```
vtune: Warning: Function 'PyEval_EvalFrameEx' can be analyzed incorrectly because it uses indirect branch instructions.
vtune: Error: Assertion failed: pytrace_tracewriter:341: (jitWriterReady) : . Please contact the technical support.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/chasse/lhcb-benchmark-scripts/profile_out -command stop.
```
Thanks to suggestions from @ahennequ I also investigated simply starting vtune directly with the job together using `-start-paused -resume-after 60 -d 120` which pauses vtune in the beginning, starts it after 60 seconds and then stops after an additional 60 seconds (-d 120 is stop after 120s from the launch).
I did observe a throughput reduction O(30%) when running vtune like this. Is this a problem for the accuracy of the actual measurement?? I don't know, I suggest you ask your favorite priest...
I'm attaching the flamegraph from this run.![flamy.svg](/uploads/78b1e54db8c1b1e733c5810f4d0e6f53/flamy.svg)
Note that for some reason there are many more stacks and more detail (good! :thumbsup: ) but on the other hand various algorithms are for some reason missing the stack of the format `classname::operator()` which makes parsing for the `FlameBars` a pain.
**So for now I suggest we continue with the 2019 version that works and requires no further intervention right now.**
How do we proceed in the future? Who knows...
I still think if we want to get more serious about these kinds of measurements we need a stack build with :tada: `framepointers` :tada:
cc @sstahl @rmatev @apearcehttps://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/-/issues/7Dynamic profiling2020-03-18T19:28:15+01:00Marian Stahlmarian.stahl@cern.chDynamic profilingThe following discussion from !41 should be addressed:
- [ ] @rmatev started a [discussion](https://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/merge_requests/41#note_3022237): (+3 comments)
> Another thing to try is to check ...The following discussion from !41 should be addressed:
- [ ] @rmatev started a [discussion](https://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/merge_requests/41#note_3022237): (+3 comments)
> Another thing to try is to check if the dynamic enabling and disabling of the profiling from C++ works now. Yesterday I got a mail "Intel Parallel Studio 2019 Update 5 available on CVMFS" on [intel-tools-announcements@cern.ch](mailto:intel-tools-announcements@cern.ch), so maybe this is finally fixed and we don't need this crazy fine tuning of when to start and stop.https://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/-/issues/4Migrate repo to PRConfig2020-04-30T19:00:03+02:00Rosen MatevMigrate repo to PRConfighttps://gitlab.cern.ch/lhcb-core/lhcb-benchmark-scripts/-/issues/3Review samples (especially event count) used in throughput tests2019-03-18T11:57:36+01:00Sascha StahlReview samples (especially event count) used in throughput testsMany of our throughput tests use a limited amount of events. Most times less than 40k events which are then repeated over and over again. This might not be enough to sample the occupancy distribution. In non-standard tests this number is...Many of our throughput tests use a limited amount of events. Most times less than 40k events which are then repeated over and over again. This might not be enough to sample the occupancy distribution. In non-standard tests this number is sometimes even reduced further, e.g. to fit all events into memory. This could make results less comparable.Mika Anton VesterinenMika Anton Vesterinen