metrics: implement two layer (disk/mem) storage (!18) · Merge requests · Monitoring / helm-charts / Kubernetes Monitoring Helm Chart

Nacho Barrientos requested to merge ib_disk_metrics into qa Sep 24, 2024

This changeset exposes a new ephemeral volume to store Prometheus-scrapped metrics if the in-memory storage gets full. The goal is to:

Prevent the pod from being killed by the controller if it fails to flush metrics to the outputs for too long (memory limit exceeded).
Allow the ingestion component to recover from outages by keeping some metrics on disk so they can be sent later on.

The configuration sets a hard limit for memory storage (max_chunks_up) that's slightly lower than the pod hard memory limit. A 5 GiB on-disk storage is configured by default, as well.

Once the memory and the disk storage are both full, Fluentbit behaves as follows:

If one of the destinations reaches the configured storage.total_limit_size, the oldest Chunk from its queue for that logical output destination will be discarded to make room for new data.

So metrics will be lost at this point but the pod (and the logs!) should always stay alive.

https://docs.fluentbit.io/manual/administration/buffering-and-storage#buffering-and-memory

Edited Sep 24, 2024 by Nacho Barrientos

metrics: implement two layer (disk/mem) storage

Merge request reports