Openshift image triggers causing indefinite rollouts
Following our discussions in moving away from DeploymentConfigs, we started to use the image.openshift.io/triggers
annotation to triggers new deployments when the image stream get updated.
This worked fine for most cases. But in some sites (maybe all, we don't know), the triggers are causing an indefinite rollouts creating new rplicasets continuously. We have seen something like this in the prod clusters a while back when there was ceph was being overloaded.
The annotation used was
image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"nginx-drupalsite-sample:8.9.13-dev-a920c1e7","namespace":"ravineet-1"},"fieldPath":"spec.template.spec.containers[?(@.name==\"nginx\")].image","pause":"false"}]'
This issue was found while testing the merge of php-fpm with sitebuilder drupal-runtime#6 (closed) Therefore, the annotation doesn't contain the trigger for php container
Investigation overview
- Tried to create a site without the
extraConfigRepo
field with theimage.openshift.io/triggers
annotation in place on the deployment. Upon site provisioning, the builds were created and when the builds succeeded, a new rollout was successfully created. Everything went as expected. - Tried to provisioned a site with the
extraConfigRepo
filed and with theimage.openshift.io/triggers
annotation. TheextraConfigRepo
creates a new imagestream and buildconfig to build the custom sitebuilder image. Upon provisioning, all required resources are created, upon creation of custom sitebuilder image, other builds dependent on this are triggered (following our buildconfig setup). And then when the new builds are completed, the rollouts started to happen continuously. This only stopped with deletion of the deployment.- The same was happening, when the operator was stopped after all the resources are created. This confirms that, the problem is not from the operator
- Upon doing the whole provisioning again, without the
image.openshift.io/triggers
annotation, we don't see the problem
We have decided to temporarily remove the annotation and set imagePullPolicy
to Always (in order to make sure the deployments don't run with older images present on the node). The issue needs to be looked into at a later point