[Misc] Add opentelemetry metrics to CTA
Description
This MR adds opentelemetry to CTA.
- Some small abstractions have been introduced that allow developers to:
- Create a telemetry configuration
- Initialise telemetry in a consistent manner using this configuration
- Globally (uniquely) define instruments per library.instrument registrations are considered a semantic error
- A number of different metrics:
- DB metrics:
- Histogram of query duration (includes possible errors)
- UpDownCounter for number of active DB connections
- Frontend metrics:
- Histogram of the processing duration of frontend requests
- Scheduler metrics:
- Histogram of the latency of lock requires (i.e. time to wait for the lock to be acquired)
- Counter for number of requests enqueued
- Taped metrics:
- Counter for number of files transferred
- Counter for number of mounts
- UpDownCounter for number of tape read/write threads
- UpDownCounter for number of disk read/write threads
- DB metrics:
Note that any histogram info can also be used to extract information about counts/rates
Overview of the changes in the MR:
-
.gitlab/ci
: manually triggered testing for testing telemetry correctness -
common/telemetry/
: basic code to provide telemetry functionality -
continuousintegration/
: add cta-dependencies as a repo as this contains the opentelemetry-cpp RPMs. Update the pods to use a flexible telemetry configuration and add basic system test for telemetry correctness. Add option--enable-telemetry
tobuild_deploy.sh
- The remainder is for the initialisation of telemetry and the usage of instruments
Checklist
-
Documentation reflects the changes made. See https://gitlab.cern.ch/cta/eoscta-docs/-/merge_requests/73 -
Merge Request title is clear, concise, and suitable as a changelog entry. See this link -
Create operations ticket for setting this up and testing this in preproduction. Done, see https://gitlab.cern.ch/cta/operations/-/issues/1784
References
Closes #1135
Edited by Niels Alexander Buegel