Skip to content

Follow-up from "Refactor of code, single GVK, minimal status"

The following discussion from !9 (merged) should be addressed:

  • @alossent started a discussion: (+1 comment)

    I think this use of requeue is incorrect. There should be no requeue in either of the following cases:

    1. the reconcile loop returns a non-nil err: the default behavior is already to retry after a backoff period. So reconcile.Result{Requeue: true}, err does not make much sense IMO
    2. the primary resource (or any secondary resource being watched) was modified (even if it's only status change), since the change will already trigger another run of the reconcile loop

    Excessive use of requeue may cause infinite reconcile loop. This happened to the webeossite-operator, so I suggest to review the use of Requeue everywhere.

    I don't think you have any case where the requeue is useful, regardless of error type: transient errors are case 1, permanent errors are case 2 (but it's OK to treat all errors as transient for now)

    See how the upstream Openshift operators do (this is one example but most of the ones I've seen are similar):

    They typically use a syncXXX or ensureXXX method instead to create/update the secondary resources, which simplifies the task and the logic in the reconcile loop.

    See example at https://github.com/openshift/master-dns-operator/blob/master/pkg/operator/masterdns/operator.go#L233 for a DNSEndpoint resource

    Also notice how the primary resource status is updated in the sync functions and error handling in this case: https://github.com/openshift/master-dns-operator/blob/master/pkg/operator/masterdns/operator.go#L329

    There's still one use case for Requeue: true in their approach: when they wait for something external to happen: we see this in https://github.com/openshift/master-dns-operator/blob/master/pkg/operator/masterdns/operator.go#L207

    Of course, we're not going to change the whole logic of this operator now but I suggest to give this sync-function approach a try at the next opportunity, like upstreams operators do. For now, I think you should remove the Requeue: true whenever there's also a reconcile error or whenever you modified the primary resource (and that probably covers all the cases where it's been used).