Skip to content

[Misc] Add opentelemetry metrics to CTA

Niels Alexander Buegel requested to merge 1135-add-telemetry-to-cta into main

Description

This MR adds opentelemetry to CTA.

  • Some small abstractions have been introduced that allow developers to:
    • Create a telemetry configuration
    • Initialise telemetry in a consistent manner using this configuration
    • Globally (uniquely) define instruments per library.instrument registrations are considered a semantic error
  • A number of different metrics:
    • DB metrics:
      • Histogram of query duration (includes possible errors)
      • UpDownCounter for number of active DB connections
    • Frontend metrics:
      • Histogram of the processing duration of frontend requests
    • Scheduler metrics:
      • Histogram of the latency of lock requires (i.e. time to wait for the lock to be acquired)
      • Counter for number of requests enqueued
    • Taped metrics:
      • Counter for number of files transferred
      • Counter for number of mounts
      • UpDownCounter for number of tape read/write threads
      • UpDownCounter for number of disk read/write threads

Note that any histogram info can also be used to extract information about counts/rates

Overview of the changes in the MR:

  • .gitlab/ci: manually triggered testing for testing telemetry correctness
  • common/telemetry/: basic code to provide telemetry functionality
  • continuousintegration/: add cta-dependencies as a repo as this contains the opentelemetry-cpp RPMs. Update the pods to use a flexible telemetry configuration and add basic system test for telemetry correctness. Add option --enable-telemetry to build_deploy.sh
  • The remainder is for the initialisation of telemetry and the usage of instruments

Checklist

References

Closes #1135

Edited by Niels Alexander Buegel

Merge request reports

Loading