Liveness and Readiness Probe
Problem
Currently we have liveness
and readiness
probes.
These check the endpoint /user/login
to validate that the php-fpm
container is ready.
On 12-07-2021
, the CephFS team raised the point:
We are noticing some strange IO patterns on Drupal CephFS shares starting around one week ago.
I had a look at one very busy client and it was a Drupal node:
"entity_id": "pvc-eba0d367-6b5d-46bc-89d7-d0667357b26a",
"hostname": "standard-avz-c-l4zwm",
"kernel_version": "5.10.19-200.fc33.x86_64",
"root": "/volumes/_nogroup/97cb7c2f-325b-4572-add8-fed3dfa9f72e"
From what I can tell, it has a share with ~250k files, which is very small.
But it is generating a huge amount of metadata ops to slowly write new files there.
Can you explain a bit what that node is doing now, for the past week?
After inspection, we saw this PVC was being used by a website's deployment, alice-figure
.
This deployment was with the serving pod 2/3
, having the readiness
probe fail on php-fpm
.
Another website, alice-conferences
, was creating a high load on CephFS.
Preview of loads generated by the alice-figure
deployment:
Solution
Deleting the probes allowed the container to start and thus stopping the load on the CephFS (seen on yellow on the plot).
How should we proceed with this?