Skip to content
Snippets Groups Projects
Commit 68211790 authored by Alex Iribarren's avatar Alex Iribarren
Browse files

Merge branch 'logsf' into 'master'

[LOS-1359] fluentbit docs

See merge request !128
parents 2d618f18 95ab5373
No related branches found
No related tags found
1 merge request!128[LOS-1359] fluentbit docs
Pipeline #10276914 passed
# Monitoring aims.cern.ch
The aims.cern.ch service is configured via the [aims hostgroup](https://gitlab.cern.ch/ai/it-puppet-hostgroup-aims) in Puppet.
The aims logs can be found under `monit_private_aims*` index in [Opensearch](https://os-linux.cern.ch/).
The fluentbit configuration for `aims` can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-aims/-/blob/qa/data/hostgroup/aims.yaml?ref_type=heads#L29)
Using systemd fluentbit input to get the logs from the following systemd units:
- httpd.service that includes reboot.cgi and server.cgi
- dnsmasq.service
- xinetd.service that includes in.tftp
We are applying `modify` and `lua` filters to get the following info:
- facility
- logsource
- message
- pid
- priority
- program
# Log monitoring
The Linux team is responsible for 3 main services:
- The main distribution server: linuxsoft.cern.ch
- The installation infrastructure servers: aims.cern.ch
- The RPM build service: koji.cern.ch
Each service is composed by test and prod virtual machines configured via Puppet.
These machines are running different services, for instance httpd, kojira, dnsmasq, etc., that we need to monitor to ensure the healthiness of the machines.
For this purpose, we use [Fluentbit](https://fluentbit.io/).
> Fluent Bit enables you to collect logs and metrics from multiple sources, enrich them with filters, and distribute them to any defined destination.
In our case the destination of the logs is [OpenSearch](https://opensearch.org/)
> OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.
I recommend you read the [Fluentbit documentation](https://docs.fluentbit.io/manual), but in summary, we define which logs are we looking for, using the input configuration. Then we can modify them using the filters and send them to Opensearch using the output configuration.
We are using the `http` output plugin to send the logs to monit-logs.cern.ch. The MONIT team generated an http_password that can be found here: `tbag show --hg lsb monit_logs_tenant_lsb_password`.
The [fluentbit module](https://gitlab.cern.ch/ai/it-puppet-module-fluentbit) is maintained by MONIT and contains the fluentbit installation and all the different inputs, parsers, filters and outputs that can be used.
The [linux-monitoring module](https://gitlab.cern.ch/ai/it-puppet-module-linux_monitoring) is common structure to:
- Define the fluent-bit configuration constants.
- Instantiate a fluent-bit service with the hostgroup name.
- Create the required configuration files and the service instance.
- Enable the debug in `/var/log/messages` using [stdout filter](https://docs.fluentbit.io/manual/pipeline/filters/standard-output).
- Define a way to receive more than one input, filters and outputs and handle its properties.
This module uses the fluentbit module.
All the fluentbit configuration needed in each hostgroup will be made by calling `linux-monitoring` module.
One of mandatory fields is the `agent_name` that is used to create the fluentbit service `"fluent-bit@${agent_name}.service"`.
The other mandatory fields are `input_plugins`, `output_plugins` and `filters`.
So, let's take a look at each main service and understand which logs are we interested in and how this is being processed by Fluentbit.
- [LXSOFT](./lxsoft.md)
- [AIMS](./aims.md)
- [KOJI](./lsb.md)
\ No newline at end of file
# Monitoring koji.cern.ch
The koji.cern.ch service is configured via the [lsb hostgroup](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lsb) in Puppet.
It is composed by `web`, `hub`, `builder` subhostgroups.
The koji logs can be found under `monit_private_lsb_logs*` index in [Opensearch](https://os-lsb.cern.ch/).
There are 4 different logtype: httpd_access, httpd_error, kojira, kojid.
The `kojid` and `kojira` are using the same parser - koji.
### lsb/web
The fluentbit configuration for `web` nodes can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lsb/-/blob/qa/data/hostgroup/lsb/web.yaml?ref_type=heads#L17)
Using the `tail` fluentbit input to get the access and error apache logs and applying `modify`, `lua`, `geoip2` and `type_converter` filters to get the following info:
- httpd_access:
- cluster
- referer
- agent
- method
- latitude
- continent_code
- time_zone
- roger_appstate
- apache_response
- country_code
- path
- logtype
- size
- clientip
- country_name
- user
- longitude
- http_error:
- cluster
- level
- latitude
- continent_code
- pid
- time_zone
- message
- roger_appstate
- country_code
- logtype
- clientip
- country_name
- longitude
### lsb/hub
The fluentbit configuration for `hub` nodes can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lsb/-/blob/qa/data/hostgroup/lsb/hub.yaml?ref_type=heads#L10)
Like in web nodes, it is using the `tail` fluentbit input to get the access and error apache logs.
But we are also processing the `kojira` logs from `/var/log/kojira.log` using the `tail` input and `modify` filter to get the following info:
- cluster
- level
- logtype
- message
- roger_appstate
### lsb/builder
The fluentbit configuration for `builder` nodes can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lsb/-/blob/qa/data/hostgroup/lsb/builder.yaml?ref_type=heads#L19)
Processing the `kojid` logs from `/var/log/kojid.log` using the `tail` input, and `modify` and `lua` filters to get the same info above.
\ No newline at end of file
# Monitoring linuxsoft.cern.ch
The linuxsoft.cern.ch service is configured via [lxsoft hostgroup](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft) in Puppet.
It is composed by 4 subhostgroups: `web`, `rsync`, `legacy` and `nomad`.
### lxsoft/web
The web logs can be found under `monit_private_lxsoft_logs_web*` index in [Opensearch](https://os-linux.cern.ch/).
We want to have access to the all httpd accesses made to [linuxsoft.cern.ch](linuxsoft.cern.ch).
The logs are arriving into `/var/log/httpd/*ssl_access*.log` of each `web` node, and we are using the `cci_apache2` parser to parse them with fluentbit.
The fluentbit configuration for `web` can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/blob/qa/data/hostgroup/lxsoft/web.yaml?ref_type=heads#L99)
Using the `tail` fluentbit input and applying `modify`, `lua`, `geoip2` and `type_converter` filters to get the following info:
- acctype - `mirror`, `snap`, `upstream`, `cern`
- action - `GET`
- apache_response
- host
- logsource
- path
- os_arch
- os_flavor
- os_tag
- os_version
- os_dist
- os_source
- repository
- puppet_managed
- response_time
- return_data_size
- kibibytes_per_second
- megabytes_per_second
- client_ip
- city_name
- continent_code
- country_code
- country_name
- latitude
- longitude
- region_name
- timezone
- protocol
- agent
Added [lua unit tests](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/tree/qa/code/files/fluentbit/tests?ref_type=heads) to check the OS information for different kinds of urls. Please keep it updated.
The geoip2 filter is using a **static** MaxMind GeoLite2-City.mmdb database.
> **_NOTE:_** The geoip2 database needs to be updated - to be discussed with the monit team.
### lxsoft/rsync
The rsync logs can be found under `monit_private_lxsoft_logs_rsync*` index in [Opensearch](https://os-linux.cern.ch/).
The fluentbit configuration for `rsync` can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/blob/qa/data/hostgroup/lxsoft/rsync.yaml?ref_type=heads#L6)
Using the `systemd` fluentbit input and applying `modify` and `lua` filters to get the following info:
- facility
- logsource
- message
- pid
- priority
- program
### lxsoft/nomad
The nomad logs can be found under `monit_private_lxsoft_logs_nomad*` index in [Opensearch](https://os-linux.cern.ch/).
The fluentbit configuration for `nomad` can be found [here](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/blob/qa/data/hostgroup/lxsoft/adm/nomad.yaml#L118)
Using fluentd docker log driver and use `forward` fluent-bit input.
We are applying `modify` and `parser` filters to get the following info:
- NOMAD_ALLOC_ID
- NOMAD_ALLOC_INDEX
- NOMAD_ALLOC_NAME
- NOMAD_GROUP_NAME
- NOMAD_JOB_NAME
- NOMAD_TASK_NAME
- container_name
- message
- source
- tag
> **_NOTE:_** The NOMAD fields, tag, container_name, message, host, and timestamp are being sent via fluent-bit opensearch output to `linux_private-nomad` index. This is done to avoid to lose our dashboards. Tt can be deprecated after we change our dashboards.
\ No newline at end of file
# Steps to get fluentbit to work
The goal is to use fluentbit to send our logs to monit-agent.cern.ch.
These logs can be accessed through Opensearch.
### 1) Created the internal user in Opensearch for MONIT with all_access roles: `linux_private_monit` (os-linux.cern.ch) and `lsb_private_monit` (os-lsb.cern.ch)
- Security -> Internal users
- Security -> Roles -> all_access -> Mapped users
### 2) Created the PWN service to share the Opensearch user/password between MONIT and Linux team
```
$ ai-pwn set service fluentbit-lsb --owners monit-support linux-team --hg lsb
$ ai-pwn show service fluentbit-lsb
$ ai-pwn set service fluentbit-lxsoft --owners monit-support linux-team --hg lsb
$ ai-pwn show service fluentbit-lxsoft
```
### 3) Add the user and password to tbag hg and service (to share with MONIT)
```
$ tbag show --hg lsb ospassword_monit
$ tbag show --hg lsb osuser_monit
$ tbag set --service fluentbit-lsb osuser_monit
$ tbag set --service fluentbit-lsb ospassword_monit
$ tbag showkeys --service fluentbit-lsb
```
The same applies to `fluentbit-lxsoft`.
### 4) Created a SNOW ticket requests
[Request registration of a new producer in MONIT: lsb](https://cern.service-now.com/nav_to.do?uri=%2Fu_request_fulfillment.do%3Fsys_id%3Dd12bc0674786d210ff618f1c736d4384) where:
- the producer is the hostgroup
- the data type is logs
- the data visibility is private
- the data access is opensearch
- document schema is json
the cluster is os-lsb.cern.ch or os-linux.cern.ch
More information in [MONIT documentation](https://monit-docs.web.cern.ch/logs/http/)
### 5) Work on the fluentbit inputs, filters and output in each hostgroup/subhostgroup
#### Inputs
The input name defines what kind of logs are you interested in. For instance, to get the httpd logs, we need to use the `tail` input.
```
hg_lsb::input_plugins:
- name: tail
properties:
path: '/var/log/httpd/*error*.log'
storage_type: 'filesystem'
tag: 'httpd_error'
parsers: ["apache_error"]
```
Some properties need to be defined depending on the input that you chose.
But, for instance, the `tag` property is transversal to all input types.
The `tag` is used to route messages. In case that you have more than one input defined, and you need to apply different filters to which one.
You can use the `match` property of filters.
#### Filters
There are some mandatory fields requested by MONIT, like `producer` and `type`.
More information in [MONIT documentation](https://monit-docs.web.cern.ch/logs/http/)
To accomplish that we created a modify filter, for instance:
```
hg_lxsoft::filters:
- name: modify
properties:
alias: 1
set:
producer: 'lxsoft'
type: 'rsync'
hostname: "%{facts.networking.fqdn}"
hostgroup: "%{facts.hostgroup}"
environment: "%{::environment}"
```
Setting the `debug_log_files` to true will enable the `stdout` filter and send all the fluentbit log files to `/var/log/messages`.
#### Output
Logs can be sent to the monit-logs HTTP endpoint.
> **_NOTE:_** The endpoint http://monit-logs.cern.ch:10012 is getting deprecated.
We need to send the logs to https://monit-logs.cern.ch:10013/\<producer\>.
More information in [MONIT documentation](https://monit-docs.web.cern.ch/logs/http/#sending-data).
```
hg_lsb::output_plugins:
- name: 'http'
properties:
host: 'monit-logs.cern.ch'
port: 10013
uri: '/lsb'
http_passwd: 'monit_logs_tenant_lsb_password'
http_user: 'lsb'
format: 'json'
match: '*'
log_level: 'debug'
json_date_key: 'timestamp'
json_date_format: 'epoch'
tls:
enabled: true
```
The http_password that can be found here: `tbag show --hg lsb monit_logs_tenant_lsb_password`.
......@@ -91,6 +91,12 @@ nav:
- 'Useful references': aims2/references.md
- 'Remote console tricks': aims2/remoteconsole.md
- 'locmap': locmap/locmap.md
- 'Log monitoring':
- 'Introduction': logs/introduction.md
- 'Linuxsoft': logs/lxsoft.md
- 'AIMS': logs/aims.md
- 'Koji': logs/lsb.md
- 'Procedures': logs/procedures.md
- 'Lxsoft-Alerts': lxsoft_alerts/introduction.md
- 'Community':
- 'CentOS': community/centos.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment