cern-it-monitoring-kubernetes
Helm Chart provided by IT Monitoring Service to install and configure required components to gather and send monitoring data from kubernetes clusters to central service.
Overview
The CERN IT Monitoring Kubernetes Helm Chart provides a solution for monitoring Kubernetes clusters at CERN. It enables the collection of metrics, logs, and future support for traces, which are forwarded to the central CERN monitoring infrastructure. From there users can consume them using the day-to-day tools that they already user like Grafana or OpenSearch.
This Helm chart simplifies the deployment and configuration of necessary components for observability, making it easier to manage monitoring across various Kubernetes clusters and their applications.
Quick Start
See getting started.
Values
Key | Type | Default | Description |
---|---|---|---|
crds.enabled | bool | true |
whether to install Prometheus operator's CRDs |
fluentbit.image.imagePullPolicy | string | "IfNotPresent" |
image pull policy applied to all Fluent Bit instances |
fluentbit.image.repository | string | "registry.cern.ch/monit/cern-it-monitoring-fluent-bit" |
image repository applied to all Fluent Bit instances |
fluentbit.image.tag | string | "3.2.6" |
image tag applied to all Fluent Bit instances |
kubernetes.clusterName | string | "" |
name of the kubernetes cluster to monitor. This value will be appended to very metric and log via k8s_cluster_name label. This bit is required if fluentbit is enabled (default) |
logs.enabled | bool | false |
indicates if logs metrics components should be enabled or not. If set to false no logs component will be installed nor configured |
logs.fluentbit.customParsers | string | "" |
|
logs.fluentbit.enabled | bool | false |
indicates if fluentbit logs component should be installed or not |
logs.fluentbit.extraVolumeMounts | list | [] |
|
logs.fluentbit.extraVolumes | list | [] |
|
logs.fluentbit.filters | string | partly autogenerated -- see values.yaml | fluentbit filters as a yaml list in a multiline string |
logs.fluentbit.image.imagePullPolicy | string | "" |
image pull policy for Fluent Bit (logs) |
logs.fluentbit.image.repository | string | "" |
repository to use for Fluent Bit (logs) |
logs.fluentbit.image.tag | string | "" |
tag to use for Fluent Bit (logs) |
logs.fluentbit.inputs | string | partly autogenerated -- see values.yaml | fluentbit inputs as a yaml list in a multiline string |
logs.fluentbit.luaScripts | object | {} |
extra Lua scripts for user-provided transformations |
logs.fluentbit.outputs | string | partly autogenerated -- see values.yaml | fluentbit outputs as a yaml list in a multiline string |
logs.fluentbit.resources.limits.cpu | string | "20m" |
|
logs.fluentbit.resources.limits.memory | string | "25Mi" |
|
logs.fluentbit.resources.requests.cpu | string | "5m" |
|
logs.fluentbit.resources.requests.memory | string | "15Mi" |
|
logs.fluentbit.service | string | partly autogenerated -- see values.yaml | fluentbit service configuration options in a multiline string |
metrics.alertmanager.enabled | bool | false |
if true alertmanager will be installed and prometheus reconfigured to use it as the alerting endpoint |
metrics.alertmanager.image | string | "registry.cern.ch/monit/cern-it-monitoring-alertmanager" |
alertmanager image to use by the local cluster alertmanager |
metrics.alertmanager.ingress.className | string | "" |
class name to be used by the alertmanager ingress |
metrics.alertmanager.ingress.enabled | bool | false |
if set to true an ingress will be created for the alertmanager service |
metrics.alertmanager.ingress.hosts | list | [] |
list of hosts for the alertmanager ingress |
metrics.alertmanager.ingress.path | string | "/" |
entry path for the alertmanager ingress |
metrics.alertmanager.ingress.pathType | string | "ImplementationSpecific" |
path type for the alertmanager ingress |
metrics.alertmanager.ingress.tls | object | {} |
tls configuration for the alertmanager ingress |
metrics.alertmanager.nodeSelector | object | {} |
node selector configuration for the alertmanager |
metrics.alertmanager.pullPolicy | string | "IfNotPresent" |
pull policy for the alertmanager image |
metrics.alertmanager.replicas | int | 3 |
number of replicas for the alertmanager deployment |
metrics.alertmanager.tag | string | "v0.27.0" |
alertmanager image tag to be used when pulling it |
metrics.alertmanager.volumeMounts | list | [] |
list of volumes to be mounted |
metrics.alertmanager.volumes | list | [] |
list of volumes to be declared |
metrics.apiServer.serviceMonitor.relabelings | list | [] |
|
metrics.coredns.serviceMonitor.relabelings | list | [] |
|
metrics.defaultNodeSelector | object | {} |
the default node selector will be applied when possible. In to the following components: metrics collectors (prometheus and fluentbit), metrics exporters (kube state). |
metrics.enabled | bool | true |
indicates if all metrics components should be enabled or not. If set to false no metrics component will be installed nor configured |
metrics.etcd.serviceMonitor.relabelings | list | [] |
|
metrics.fluentbit.diskMaxCache | string | "5G" |
max size for in-disk storage for fluent-bit |
metrics.fluentbit.enabled | bool | true |
if true fluentbit metrics forwarder will be installed |
metrics.fluentbit.filters | string | partly autogenerated -- see values.yaml | fluentbit filters as a yaml list in a multiline string |
metrics.fluentbit.image.imagePullPolicy | string | "" |
image pull policy for Fluent Bit (metrics) |
metrics.fluentbit.image.repository | string | "" |
repository to use for Fluent Bit (metrics) |
metrics.fluentbit.image.tag | string | "" |
tag to use for Fluent Bit (metrics) |
metrics.fluentbit.inputs | string | partly autogenerated -- see values.yaml | fluentbit inputs as a yaml list in a multiline string |
metrics.fluentbit.luaScripts | object | {} |
extra Lua scripts for user-provided transformations |
metrics.fluentbit.nodeSelector | object | {} |
|
metrics.fluentbit.outputs | string | partly autogenerated -- see values.yaml | fluentbit outputs as a yaml list in a multiline string |
metrics.fluentbit.prometheusRemoteWriteInputConfig.bufferChunkSize | string | "128M" |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.bufferMaxSize | string | "2G" |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.listen | string | "0.0.0.0" |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.port | int | 8080 |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.successfulResponseCode | int | 201 |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.tag | string | "monit.prom.k8s" |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.tagFromUri | bool | false |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.threaded | bool | false |
|
metrics.fluentbit.prometheusRemoteWriteInputConfig.uri | string | "/api/prom/push" |
|
metrics.fluentbit.replicas | int | 2 |
|
metrics.fluentbit.resources.limits.cpu | string | "1" |
|
metrics.fluentbit.resources.limits.memory | string | "1Gi" |
|
metrics.fluentbit.resources.requests.cpu | string | "1" |
|
metrics.fluentbit.resources.requests.memory | string | "512Mi" |
|
metrics.fluentbit.service | string | partly autogenerated -- see values.yaml | fluentbit service configuration options in a multiline string |
metrics.ingress.nginx.serviceMonitor.relabelings | list | [] |
|
metrics.kubeProxy.serviceMonitor.relabelings | list | [] |
|
metrics.kubeState.enabled | bool | true |
if true kube state will be installed together with a service monitor |
metrics.kubeState.nodeSelector | object | {} |
|
metrics.kubeState.resources.limits.cpu | string | "20m" |
|
metrics.kubeState.resources.limits.memory | string | "25Mi" |
|
metrics.kubeState.resources.requests.cpu | string | "5m" |
|
metrics.kubeState.resources.requests.memory | string | "15Mi" |
|
metrics.kubeState.scrapeInterval | string | "30s" |
indicates how often this exporter will be scraped by the local prometheus |
metrics.kubeState.serviceMonitor.relabelings | list | [] |
|
metrics.kubecontroller.serviceMonitor.relabelings | list | [] |
|
metrics.kubelet.serviceMonitor.relabelings | list | [] |
|
metrics.nodeExporter.enabled | bool | true |
if true node exporter will be installed as a daemon set together with a pod monitor |
metrics.nodeExporter.resources.limits.cpu | string | "20m" |
|
metrics.nodeExporter.resources.limits.memory | string | "25Mi" |
|
metrics.nodeExporter.resources.requests.cpu | string | "5m" |
|
metrics.nodeExporter.resources.requests.memory | string | "15Mi" |
|
metrics.nodeExporter.scrapeInterval | string | "" |
indicates how often this exporter will be scraped by the local prometheus |
metrics.nodeExporter.serviceMonitor.relabelings | list | [] |
|
metrics.prometheus.enabled | bool | true |
if true prometheus operator and a prometheus server will be installed |
metrics.prometheus.operator.nodeSelector | object | {} |
|
metrics.prometheus.operator.resources.limits.cpu | string | "100m" |
|
metrics.prometheus.operator.resources.limits.memory | string | "100Mi" |
|
metrics.prometheus.operator.resources.requests.cpu | string | "5m" |
|
metrics.prometheus.operator.resources.requests.memory | string | "25Mi" |
|
metrics.prometheus.server.extraLabelsForMetrics | object | {} |
set of static labels and values to add to all the metrics gathered by the in-cluster prometheus when exported to central monitoring |
metrics.prometheus.server.image | string | "registry.cern.ch/monit/cern-it-monitoring-prometheus:v2.53.3" |
prometheus image to use by the local cluster prometheus |
metrics.prometheus.server.nodeSelector | object | {} |
prometheus operator node selectors. If set it will override the metrics.defaultNodeSelector |
metrics.prometheus.server.relabelings | list | [] |
allows to drop / relabel node Exporter metrics. |
metrics.prometheus.server.remoteWrite | object | {} |
remote write prometheus configuration |
metrics.prometheus.server.resources.limits.cpu | string | "500m" |
|
metrics.prometheus.server.resources.limits.memory | string | "5Gi" |
|
metrics.prometheus.server.resources.requests.cpu | string | "100m" |
|
metrics.prometheus.server.resources.requests.memory | string | "2Gi" |
|
metrics.prometheus.server.retention | string | "24h" |
interval during which local cluster prometheus will store metrics |
metrics.prometheus.server.scrapeInterval | string | "10s" |
interval used to self scrape metrics |
metrics.prometheus.server.scrapeTimeout | string | "5s" |
timeout for self scraped metrics |
metrics.prometheus.server.serviceMonitors | list | [] |
service monitors to be created |
metrics.prometheus.server.version | string | "v2.53.3" |
prometheus version to use by the local cluster prometheus |
metrics.pushgateway.enabled | bool | false |
pushgateway allows you to send metrics to the monitoring infrastructure by pushing them to the local cluster service it-monit-metrics-collector-pushgateway. |
metrics.pushgateway.image.pullPolicy | string | "IfNotPresent" |
|
metrics.pushgateway.image.repository | string | "registry.cern.ch/monit/cern-it-monitoring-pushgateway" |
|
metrics.pushgateway.image.tag | string | "v1.10.0" |
|
metrics.pushgateway.ingress.className | string | "" |
|
metrics.pushgateway.ingress.enabled | bool | false |
if set to true will install register a new ingress with the given configuration. |
metrics.pushgateway.ingress.hosts | list | [] |
|
metrics.pushgateway.ingress.path | string | "/" |
|
metrics.pushgateway.ingress.pathType | string | "ImplementationSpecific" |
|
metrics.pushgateway.ingress.tls | object | {} |
|
metrics.pushgateway.nodeSelector | object | {} |
if given will override the defaultNodeSelector and install the component only on the nodes that match the given condition. |
metrics.pushgateway.resources.limits.cpu | float | 0.2 |
|
metrics.pushgateway.resources.limits.memory | string | "100Mi" |
|
metrics.pushgateway.resources.requests.cpu | float | 0.2 |
|
metrics.pushgateway.resources.requests.memory | string | "100Mi" |
|
metrics.scheduler.serviceMonitor.relabelings | list | [] |
|
metrics.scheduler.serviceMonitor.relabelings | list | [] |
|
otlp.endpoint | string | "monit-otlp.cern.ch" |
otlp endpoint where the otlp receivers are listening |
otlp.port | int | 4319 |
otlp port where the otlp receivers are listening |
tenant.name | string | "" |
username used for authenitcating in the MONIT infrastructure |
tenant.password | string | "" |
password (plain) used for authenticating in the MONIT infrastructure |
Contributing
We welcome contributions! If you're interested in helping improve this project, please review our contribution guidelines. In brief:
- Fork the repository.
- Create a feature branch.
- Implement, provide tests and validate your changes.
- Submit a Merge Request (MR) to the
master
branch.
For a full contribution workflow, visit the contribution guide.
Documentation
Complete documentation for this chart, including setup and configuration details, is available:
License
This repository is licensed under the Apache License 2.0. See the LICENSE file for more information.