Skip to content
Snippets Groups Projects

Added 301 case for health probes

Closed Francisco Borges Aurindo Barros requested to merge fix-probe into v9.4-1
2 unresolved threads

The IT website suffered intermittent downtime for the past few days (02/09 to 09/09) due to what was believed to be a bug caused from upstream. bug link.

Although the instance was indeed affected by it, the real culprit for the downtime was the liveness Probe, returning 301 due to changes made by Catharine Noble (confirmed by her) on Friday (02/09/2022) that made the default endpoint permanently redirect to /welcome.

This MR changes the probe to include and acknowledge 301 as an acceptable code returned by the base path (/).

This MR will also solve the problem reported in the OTG permanently.

NB: The IT website has the livenessProbe disabled until this MR is accepted and changes are propagated to the cluster. This means the annotation for no updates from the Operator must be removed after propagating the changes to production.

Edited by Francisco Borges Aurindo Barros

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
30 30
31 31 # Expected responses:
32 32 # 200: normally working base URL
33 # 301: Moved Permanently, this happens when the website's homepage is not on the default path, `/`
33 34 # 302: redirection (NOTE: not sure if there's a legitimate case to expect this)
34 35 # 403: fully private websites give this response
35 36 # 503: high load
36 if [[ "${HTTP_CODE_BASE}" -ne "200" && "${HTTP_CODE_BASE}" -ne "302" && "${HTTP_CODE_BASE}" -ne "403" && "${HTTP_CODE_BASE}" -ne "503" ]]; then
37 if [[ "${HTTP_CODE_BASE}" -ne "200" && "${HTTP_CODE_BASE}" -ne "301" && "${HTTP_CODE_BASE}" -ne "302" && "${HTTP_CODE_BASE}" -ne "403" && "${HTTP_CODE_BASE}" -ne "503" ]]; then
  • Carina Antunes approved this merge request

    approved this merge request

  • added 1 commit

    • 0b2024fd - Updated probe endpoint to <SITE>/_site/_php-fpm-status

    Compare with previous version

  • added 1 commit

    • 0799525e - Added check on number of active PHP processes

    Compare with previous version

  • 35 # 503: high load
    36 if [[ "${HTTP_CODE_BASE}" -ne "200" && "${HTTP_CODE_BASE}" -ne "302" && "${HTTP_CODE_BASE}" -ne "403" && "${HTTP_CODE_BASE}" -ne "503" ]]; then
    32 # 200: php_fpm is reporting it's status, therefore should be working as expected
    33 if [[ "${HTTP_CODE_BASE}" -ne "200" ]]; then
    37 34 echo "Probe failed" >> $FILE
    38 echo "Probe failed. Endpoint / responds with code: $HTTP_CODE_BASE"
    35 echo "Probe failed. Endpoint / responds with code: $HTTP_CODE_BASE" >> $FILE
    36 echo "PHP-FPM Output" $(curl localhost:8080/_site/_php-fpm-status --silent --insecure) >> $FILE
    37 exit 1
    38 fi
    39
    40 # We can retrieve the number of active PHP processes from the endpoint,
    41 # This is a variable described here: https://www.php.net/manual/en/fpm.status.php
    42 # If the value is '0', that means there will be no processes processing requests
    43 # In such cases the probe will fail and force a restart of the container
    44 ACTIVE_PHP_PROCESSES=$(curl --max-time 200 --silent --fail --insecure localhost:8080/_site/_php-fpm-status?json | jq -r '."active processes"')
  • Handled the same issue with a different approach, !162 (merged).

    Although monitoring /_site/_php-fpm-status could be considered more accurate to restart the container (as it would mean php-fpm is not running), restarting on 50x codes is reasonable too.

    Closing this MR.

  • Please register or sign in to reply
    Loading