Nacho Barrientos · 7615a447 · 1f8f39f7 · 1d6ef47f · 68623d80 · 3eed51e6
--- a/docs/logs.md

+ 88

− 94
+++ b/docs/logs.md

+ 88

− 94
 # Logs Configuration Guide

-This guide provides instructions on configuring and managing logs collection in the **CERN IT Monitoring Kubernetes Helm Chart**. Logs provide valuable insights into the operation and health of both the Kubernetes cluster and the applications running within it. This chart uses **Fluentbit** to gather and forward logs to the central monitoring infrastructure.
+This guide provides instructions on configuring and managing logs
+collection in the **CERN IT Monitoring Kubernetes Helm Chart**. Logs
+provide valuable insights into the operation and health of both the
+Kubernetes cluster and the applications running within it. This chart
+uses **Fluent Bit** to gather and forward logs to the central
+monitoring infrastructure.

-## 1. Enabling/Disabling Logs
+## Enabling logs collection

-The logging system is controlled via the `logs.enabled` flag. By default, logs collection is disabled.
+The logs collection and processing system is controlled via the
+`logs.enabled` flag. By default, this system is disabled.

-To **enable** or **disable** logs collection:
+To **enable** logs collection and send them over to MONIT:

 ```yaml
 logs:
-  enabled: true  # Set to 'false' to disable logs collection
+  enabled: true
+  fluentbit:
+    enabled: true
 ```
-If `logs.enabled` is set to `false`, no logging components (such as Fluentbit) will be installed or configured.

-## 2. Base Logs Collected
+If `logs.enabled` is set to `false`, no logging components (such as
+Fluent Bit) will be installed or configured.

-By default, the chart deploys **Fluentbit** as a DaemonSet to gather logs from Kubernetes containers. Fluentbit is highly efficient and lightweight but can generate substantial API requests in large deployments (100+ nodes) due to the Kubernetes filter.
+## Configuring what logs are collected

-### Fluentbit Configuration
-Fluentbit collects logs from the `/var/log/containers/` directory and applies a set of filters before forwarding them to the central monitoring system via the OpenTelemetry protocol.
+⚠️ Without further user-provided configuration, only logs generated by
+pods in the `default` namespace are collected. To augment the set of
+observed namespaces, please configure the desired list of namespaces
+via the `logs.collectedNamespaces` setting, for example:

 ```yaml
 logs:
-  fluentbit:
-    enabled: true  # Enable/disable Fluentbit for log collection
-    resources:
-      requests:
-        cpu: "5m"
-        memory: "15Mi"
-      limits:
-        cpu: "20m"
-        memory: "25Mi"
+  collectedNamespaces:
+    - myapp
+    - user.+
 ```

-The **Fluentbit DaemonSet** is deployed on all nodes, scraping container logs and processing them based on the inputs, filters, and outputs defined in the configuration.
+Please keep in mind that, as described in more detail below, logs
+collection implies queries to the Kubernetes API to enrich log data
+with cluster state data, so the more logs you collect, the more
+pressure is put on the API. Therefore, please be reasonable and only
+process what you actually care about.

-### Fluentbit Service Configuration
-The Fluentbit service is configured with the following options:
+## Implementation details

-```yaml
-service: |
-  [SERVICE]
-      Daemon Off
-      Flush 1
-      Log_Level INFO
-      HTTP_Server On
-      HTTP_Listen 0.0.0.0
-      HTTP_Port 2020
-      Health_Check On
-```
+All the raw Fluent Bit configuration can be found in the default
+`values.yaml`, however here's a high level description of the
+mechanics:

-### Fluentbit Input Configuration
-By default, Fluentbit gathers logs from all containers under `/var/log/containers/*`.log and uses the **CRI parser** to handle multiline log entries.
+### Inputs

-```yaml
-inputs: |
-  [INPUT]
-      Name tail
-      Path /var/log/containers/*.log
-      multiline.parser cri
-      Tag kube.*
-      Mem_Buf_Limit 20MB
-      Skip_Long_Lines Off
-```
+Fluent Bit collects logs from the `/var/log/containers/` directory in
+all nodes using a DaemonSet. Logs are parsed using the [CRI (Container
+Runtime Interface) logging format
+parser](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/multiline-parsing#built-in-multiline-parsers).

-### Fluentbit Filter Configuration
-Fluentbit applies several filters to enrich logs with Kubernetes metadata (e.g., labels and annotations), and organizes the structure of the log records.
+### Filters

-```yaml
-filters: |
-  [FILTER]
-      Name                kubernetes
-      Match               kube.*
-      Kube_URL            https://kubernetes.default.svc:443
-      Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
-      Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
-      Merge_Log           On
-      K8S-Logging.Parser  On
-      K8S-Logging.Exclude Off
-```
+After the initial log gathering, Fluent Bit applies several filters to
+enrich logs with Kubernetes metadata (e.g., labels and annotations)
+using the so-called [Kubernetes
+filter](https://docs.fluentbit.io/manual/pipeline/filters/kubernetes),
+and organizes the structure of the log records.

-### Fluentbit Output Configuration
-Logs are forwarded using the **OpenTelemetry** protocol to the specified endpoint, adding custom labels (such as cluster name) to the logs.
+The filters are also in charge of discarding logs generated in
+namespaces that are not being observed.

-```yaml
-outputs: |
-  [OUTPUT]
-      name opentelemetry
-      match *
-      add_label job kubernetes
-      add_label k8s_cluster_name {{ .Values.kubernetes.clusterName }}
-      host {{ .Values.otlp.endpoint }}
-      port {{ .Values.otlp.port }}
-      logs_uri /v1/logs
-      tls on
-      tls.verify off
-      http_user {{ .Values.tenant.name }}
-      http_passwd {{ .Values.tenant.password }}
-```
+### Outputs
+
+Finally, enriched logs are forwarded to MONIT using the OpenTelemetry
+protocol to the specified endpoint.
+
+Events are shipped with `monit_type` set to `kubernetes` for easy
+filtering.

-### Fluentbit Custom Lua Scripts
-In Fluent Bit we can use Lua scripts to create processors that transform the records. These scripts are available in the Fluent Bit container under the path `/fluent-bit/etc/scripts`. In order to create new Lua scripts just add them in the values `logs.fluentbit.luaScripts` key.
+## Fluent Bit Custom Lua Scripts
+
+Fluent Bit allows using [Lua
+scripts](https://docs.fluentbit.io/manual/pipeline/filters/lua) to
+create processors that transform the records. These scripts are
+available in the Fluent Bit container under the path
+`/fluent-bit/etc/scripts`. In order to create new Lua scripts just add
+them in the values `logs.fluentbit.luaScripts` key.
+
+For example, the following configuration will create two files in
+`/fluent-bit/etc/scripts`. One named `my_lua_script.lua` and another
+one named `my_other_load_script.lua`:

-For example, the following configuration will create two files in `/fluent-bit/etc/scripts`. One named `my_lua_script.lua` and another one named `my_lua_script2.lua`.
 ```yaml
-luaScripts:
-  my_lua_script.lua: |
-    function my_function(tag, timestamp, record)
-      // Do something...
-      return 2, timestamp, record
-      end
-  my_other_lua_script.lua: |
-    ...
+logs:
+  fluentbit:
+    luaScripts:
+      my_lua_script.lua: |
+        function my_function(tag, timestamp, record)
+          // Do something...
+          return 2, timestamp, record
+          end
+      my_other_lua_script.lua: |
+        ...
 ```

-## 3. Customizing Fluentbit for Additional Log Sources
+## Customising Fluent Bit for Additional Log Sources

-You can extend the logging configuration by extending previous configurations with custom Fluentbit **inputs**, **filters**, or **outputs**. This is especially useful if you want to gather logs from additional sources or apply different processing to specific logs.
+You can extend the logging configuration by extending previous
+configurations with custom Fluent Bit **inputs**, **filters**, or
+**outputs**. This is especially useful if you want to gather logs from
+additional sources or apply different processing to specific logs.

 ### Extra Volumes and Volume Mounts
-You can also configure **extra volumes** and **volume mounts** for Fluentbit to collect logs from different paths or persistent volumes:
+You can also configure **extra volumes** and **volume mounts** for
+Fluent Bit to collect logs from different paths or persistent volumes:

 ```yaml
 logs:
 @@ -134,15 +125,18 @@ logs:
        mountPath: /mnt/logs
 ```

-## 4. Considerations for Large Deployments
+## Considerations for Large Deployments

-Be cautious when enabling Fluentbit for large clusters (100+ nodes). The **Kubernetes filter** generates API requests to the Kubernetes API server, and these requests may become significant in large-scale deployments. In such cases, you may need to:
+Be cautious when enabling Fluent Bit for large clusters (100+
+nodes). The **Kubernetes filter** generates API requests to the
+Kubernetes API server, and these requests may become significant in
+large-scale deployments. In such cases, you may need to:

- Adjust Fluentbit resource requests and limits.
+- Adjust Fluent Bit resource requests and limits (via `logs.fluentbit.resources`).
 - Tune the API request rate by modifying the filters.
 - Consider other optimization techniques, such as excluding specific log sources.

-## 5. Kubernetes events
+## Kubernetes events

 This chart also allows sending Kubernetes events as log entries to
 MONIT, however this feature is disabled by default. To enable it,