From 1a53fbbecc658be11e8649ab779ee09c3cf9ef01 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 11:33:11 +0100 Subject: [PATCH 1/7] Add some notes about configuring Prometheus' memory limits --- docs/getting_started.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/getting_started.md b/docs/getting_started.md index c5fd364..bb12f09 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -87,6 +87,43 @@ helm uninstall cern-it-monitoring-kubernetes ``` This will remove most of the associated monitoring components deployed by this Helm chart. The CRDs will be left behind, though. +## Resource limits + +The most memory-hungry component deployed by this chart is +Prometheus. The amount of memory consumed by this element is normally +a function of the targets to be scrapped and the volume of data +collected from them. + +The predominant scaling variables are typically the number of +(Kubernetes) nodes and the number of Kubernetes resources, as there's one +node exporter being installed on each node and the data generated by +the Kubernetes API exporter grows as the number of Kubernetes +resources installed increases. However, the system allows extending +the scrapped endpoints by providing custom service and pod monitors so +these also play a role when scaling the cluster-local Prometheus. + +Therefore, it's tricky to provide sensible defaults covering lots of +cases so the provided resource limits might be too lax for a seasoned +cluster. The typical symptom of resource exhaustion you might observe +is pod(s) deployed by the Prometheus deployment being OOMKilled or +even buried in an init crash loop if the memory pressure is very high, +leading to gaps in the data at visualisation time. + +The chart allows increasing the memory limit of the Prometheus server +via [user-provided values](values.md), for example: + +```yaml +metrics: + prometheus: + server: + resources: + limits: + memory: 8G +``` + +There are several ways to consult the current value: by looking at the +default values of the chart, by looking at the output of `helm get +values` or by inspecting the `Prometheus` resource in your cluster. ## Additional Resources -- GitLab From 70f71691b5d6c3a15e9be1697240bf35f958cf23 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 11:53:42 +0100 Subject: [PATCH 2/7] Remove documentation about non-existent key path --- docs/values.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/values.md b/docs/values.md index 4da872d..726395a 100644 --- a/docs/values.md +++ b/docs/values.md @@ -42,12 +42,6 @@ This file contains the markdown version of the default values that this chart ta | metrics.kubeState.resources.requests.cpu | string | `"5m"` | | | metrics.kubeState.resources.requests.memory | string | `"15Mi"` | | | metrics.kubeState.scrapeInterval | string | `"15s"` | indicates how often kube state will be scraped by the local prometheus | -| metrics.metricsserver.enabled | bool | `true` | if true metrics server will be installed | -| metrics.metricsserver.nodeSelector | hash | `"nil"` | metricsserver node selectors | -| metrics.metricsserver.resources.limits.cpu | string | `"100m"` | | -| metrics.metricsserver.resources.limits.memory | string | `"200Mi"` | | -| metrics.metricsserver.resources.requests.cpu | string | `"100m"` | | -| metrics.metricsserver.resources.requests.memory | string | `"200Mi"` | | | metrics.nodeExporter.enabled | bool | `true` | if true node exporter will be installed as a daemon set together with a pod monitor | | metrics.nodeExporter.resources.limits.cpu | string | `"20m"` | | | metrics.nodeExporter.resources.limits.memory | string | `"25Mi"` | | -- GitLab From 9bf3957b9471197d43e9ca89e410b7b8c1314cb2 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 11:59:38 +0100 Subject: [PATCH 3/7] Avoid duplicating installation instructions --- README.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/README.md b/README.md index 1a41dfd..c343b5f 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,7 @@ This Helm chart simplifies the deployment and configuration of necessary compone ## Quick Start -To get started with deploying the chart, follow the detailed installation instructions available in our [documentation](https://monit-docs.web.cern.ch). - -```bash -helm install cern-it-monitoring-kubernetes oci://registry.cern.ch/monit/cern-it-monitoring-kubernetes --version <gitlab-tag> -``` -For detailed configuration options and values, refer to the [values.yaml](values.yaml) file and the [documentation page](https://monit-docs.web.cern.ch). +See [getting started](docs/getting_started.md). ## Contributing -- GitLab From 9626b453e1f94cea86281e356e2b58025fed4bd7 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 12:00:12 +0100 Subject: [PATCH 4/7] Encourage writing unit tests --- CONTRIBUTING.md | 11 +++++------ README.md | 2 +- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2a0c60a..7ec0d44 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -21,15 +21,14 @@ git checkout -b feature/my-awesome-feature ``` ### 3. Implement and Test Your Changes -Develop your changes locally. Before submitting your contribution, please ensure that you test your changes thoroughly by running the chart locally: +Develop your changes locally. Please include all the necessary unit tests covering your new feature. Before submitting your contribution, please ensure that you validate your changes thoroughly by running the chart locally. -```bash -helm install . cern-it-monitoring-kubernetes -f values.yaml -f my-values.yaml -``` -This ensures that your changes do not introduce any unintended issues. +See [getting started](docs/getting_started.md) for hints on how to install the chart. + +This procedure ensures that your changes do not introduce any unintended issues. ### 4. Update Documentation -If your changes modify or extend the functionality of the chart, do not forget to update the relevant documentation (such as `README.md` or any related configuration files) to reflect these modifications. +If your changes modify or extend the functionality of the chart, do not forget to update the relevant documentation (such as `README.md`, `docs/values.md` or any related configuration files) to reflect these modifications. ### 5. Submit a Merge Request Once your changes are ready, push your branch to your fork and create a Merge Request (MR) targeting the `master` branch of the main repository. diff --git a/README.md b/README.md index c343b5f..6e28435 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ We welcome contributions! If you're interested in helping improve this project, 1. **Fork** the repository. 2. Create a **feature branch**. -3. Implement and **test** your changes. +3. Implement, provide tests and validate your changes. 4. Submit a **Merge Request (MR)** to the `master` branch. For a full contribution workflow, visit the [contribution guide](CONTRIBUTING.md). -- GitLab From e00dff4f644f87271b7b431d65159c2697e6fb3c Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 13:44:11 +0100 Subject: [PATCH 5/7] Remove duplicated information --- README.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/README.md b/README.md index 6e28435..ca7b904 100644 --- a/README.md +++ b/README.md @@ -25,13 +25,9 @@ For a full contribution workflow, visit the [contribution guide](CONTRIBUTING.md Complete documentation for this chart, including setup and configuration details, is available: -- Project Documentation: [link](https://monit-docs.web.cern.ch) - GitLab Repository: [link](docs) -- License: Apache-2.0. See the [LICENSE](LICENSE) file for details. +- Project Documentation: [link](https://monit-docs.web.cern.ch) ## License This repository is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). See the [LICENSE](LICENSE) file for more information. - ---- -For more details, visit the [CERN Monitoring Documentation](https://monit-docs.web.cern.ch) or explore the [docs](docs) folder. \ No newline at end of file -- GitLab From 330d14c199e206599ee765336937abd328856eb0 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 14:08:27 +0100 Subject: [PATCH 6/7] Ask to delete existing CRDs before installing the chart --- docs/getting_started.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index bb12f09..4062559 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -14,7 +14,15 @@ Before you begin, ensure you have the following: ## 1. Install the CERN IT Monitoring Helm Repository -First, install the official **CERN IT Monitoring** Helm chart repository from your Helm client. Remember to substitute the `<gitlab-tag>` with the version you want to install (ex. `1.0.0-rc1`). +If the Kubernetes cluster you're targeting has any CRD of the +Prometheus operator ecosystem already installed please first delete +them all before proceeding by executing: + +```bash +kubectl get customresourcedefinitions -ojson | jq '.items[] | select(.spec.group | test("monitoring\\.coreos\\.com$")) | .metadata.name' | xargs -n 1 kubectl delete customresourcedefinitions +``` + +Secondly, install the official **CERN IT Monitoring** Helm chart repository from your Helm client. Remember to substitute the `<gitlab-tag>` with the version you want to install (ex. `1.0.0-rc1`). We recommend installing the chart in the `monitoring` namespace. If it does not exists you can create it. ```bash -- GitLab From 8b75fadfa2ce5a5d25116cf18bae6c274bace299 Mon Sep 17 00:00:00 2001 From: Nacho Barrientos <nacho.barrientos@cern.ch> Date: Tue, 11 Mar 2025 14:19:25 +0100 Subject: [PATCH 7/7] Fix typos --- docs/getting_started.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index 4062559..2152f06 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -23,7 +23,7 @@ kubectl get customresourcedefinitions -ojson | jq '.items[] | select(.spec.grou ``` Secondly, install the official **CERN IT Monitoring** Helm chart repository from your Helm client. Remember to substitute the `<gitlab-tag>` with the version you want to install (ex. `1.0.0-rc1`). -We recommend installing the chart in the `monitoring` namespace. If it does not exists you can create it. +We recommend installing the chart in the `monitoring` namespace. If it does not exist you can create it. ```bash helm install cern-it-monitoring-kubernetes oci://registry.cern.ch/monit/cern-it-monitoring-kubernetes --version <gitlab-tag> -f my-values.yaml -n monitoring @@ -68,7 +68,7 @@ kubectl logs it-monit-metrics-collector-fluentbit-0 -n monitoring ## 4. Updating the Chart -You might need to update the existing configuration of your chart. For that the easiest solution will be to apply again a set of values. Foe example: +You might need to update the existing configuration of your chart. For that the easiest solution will be to apply again a set of values. For example: ```bash # 1. Get the existing values -- GitLab