Skip to content

Use proper Memory metrics for gpu dashboard

Diogo Filipe Tomas Guerra requested to merge gpu-dashboard into master

Some improvements:

  • Correct usage of Memory Metrics (instead of Bandwidth)
  • Replacement of the fan speed graph for the SM metrics
  • Added Profiling metrics for FP64, FP32 and FP16 together with Tensor core

The added/refactored metrics require the usage of a custom dcgmExporter configMap that exports the following metrics in addition to defaults:

  • DCGM_FI_PROF_PIPE_FP64_ACTIVE
  • DCGM_FI_PROF_PIPE_FP32_ACTIVE
  • DCGM_FI_PROF_PIPE_FP16_ACTIVE
  • DCGM_FI_PROF_SM_ACTIVE
  • DCGM_FI_PROF_SM_OCCUPANCY
  • DCGM_FI_DEV_FB_TOTAL

Closes: #37 (closed)

Edited by Diogo Filipe Tomas Guerra

Merge request reports