Skip to content

Replace VTune by Linux Perf in throughput tests

Christoph Hasse requested to merge chasse_move_to_perf into master

VTune is replaced by the use of Linux perf

Linux perf is supported in two modes:

  • fp mode which requires frame pointers to be available, e.g. through using a platform like x86_64_v3-centos7-gcc10fp-opt
  • default is dwarf mode which uses dwarf based stack unwinding which needs debug symbols, thus you need a platform that compiles with -g

Both modes are really low overhead compared to vtune. E.g. hlt1_pp_default can run on a single numa node at ~13.5kHZ while being profiled, compared to 13.65kHz normally.

Benefits:

  • linux perf is available by default, no need to install vtune
  • low overhead, thus a more realistic measurement
  • easier to use, no need for hacks like late attaching to the job, or causing segfaults through profiling

Some Caveats:

  • "fp" we have a few broken stack frames that can't be avoided because those are samples taken while we are in libc which doesn't have frame pointers
  • "dwarf" very nice detailed flamegraphs but some inconsistencies because some algorithms seem to not be showing their complete ancestry. (maybe broken DWARF, or old kernel 🤷 ?!)

EDIT:
Caveat of dwarf mode solved if we run with a stack-size setting of 64kB 🎉
But this mode dumps about 5 GB of data, thus we should make sure the handler doesn't try to back that up.

I don't think this needs any changes from the database side, but we could at some point remove the sourcing of the intel stuff.
cc @maszyman

Edited by Christoph Hasse

Merge request reports