Review and tune fluentd configuration
Current fluentd configuration is:
requests:
cpu: 5m
memory: 32Mi
If fluentd pods are started on a new cluster, there is no problem. If they get restarted and the cluster that is already highly utilized, the pod takes a long time to start and get killed because the readiness probe fails.
- We might need to increase the requests: cpu.requests=50m, memory.requests=200Mi to make sure the pod comes up.
- Specify an upper limit (5G?) of logs that can be stored in fluentd buffer, to ensure even if there are problems with flushing the logs, the node doesn't into disk pressure state. But this will imply losing the logs when reaching the upper limit.
- In the future, if we see that flushing of logs is very slow, we might need to tune some additional values
Edited by Diana Gaponcic