Liveness and Readiness Probe

Problem

Currently we have liveness and readiness probes.
These check the endpoint /user/login to validate that the php-fpm container is ready.

On 12-07-2021, the CephFS team raised the point:

 We are noticing some strange IO patterns on Drupal CephFS shares starting around one week ago.
I had a look at one very busy client and it was a Drupal node:

            "entity_id": "pvc-eba0d367-6b5d-46bc-89d7-d0667357b26a",
            "hostname": "standard-avz-c-l4zwm",
            "kernel_version": "5.10.19-200.fc33.x86_64",
            "root": "/volumes/_nogroup/97cb7c2f-325b-4572-add8-fed3dfa9f72e"
From what I can tell, it has a share with ~250k files, which is very small.
But it is generating a huge amount of metadata ops to slowly write new files there.

Can you explain a bit what that node is doing now, for the past week?

After inspection, we saw this PVC was being used by a website's deployment, alice-figure.
This deployment was with the serving pod 2/3, having the readiness probe fail on php-fpm.

Another website, alice-conferences, was creating a high load on CephFS.

Preview of loads generated by the alice-figure deployment:

Solution

Deleting the probes allowed the container to start and thus stopping the load on the CephFS (seen on yellow on the plot).
How should we proceed with this?

Edited Jul 12, 2021 by Francisco Borges Aurindo Barros