Use proper Memory metrics for gpu dashboard
Some improvements:
- Correct usage of Memory Metrics (instead of Bandwidth)
- Replacement of the fan speed graph for the SM metrics
- Added Profiling metrics for FP64, FP32 and FP16 together with Tensor core
The added/refactored metrics require the usage of a custom dcgmExporter configMap that exports the following metrics in addition to defaults:
- DCGM_FI_PROF_PIPE_FP64_ACTIVE
- DCGM_FI_PROF_PIPE_FP32_ACTIVE
- DCGM_FI_PROF_PIPE_FP16_ACTIVE
- DCGM_FI_PROF_SM_ACTIVE
- DCGM_FI_PROF_SM_OCCUPANCY
- DCGM_FI_DEV_FB_TOTAL
Closes: #37 (closed)
Edited by Diogo Filipe Tomas Guerra