Skip to content

[CI] Add Helm support for SSD-based stress tests and clean up charts

Description

  • This MR introduces a number of values.yaml files for Helm to use for stress tests when the fast-ssd storage class is available.
  • Allow the user to create extra K8s objects in the values file.
  • Added the option to add a custom preRun and postRun script to the frontend and taped pods. This ensures we can e.g. keep the pod alive for debugging in a dev environment, but that this is not a hard constraint
  • Removed the init_pod.sh script. This script did 3 things:
    1. mount logs for each container to a specific directory on a shared log volume -> this was only necessary for the stress test so that fluentd can consume the logs from one place. This scenario is now covered by doing this in a preRun section; that way we don't pollute the "normal" runs with this and don't bake it into the Docker image.
    2. Fix some XRootD-related DNS issues -> no longer necessary, so removed
    3. Redirect core dumps to /var/log/tmp -> already part of the minikube CI setup
  • Removed sqlite from the Helm setup as this would not work anyway
  • Instead of the VFS scheduler relying on /shared, it now relies on a shared persistent volume. Note that for this, a local-path provisioner should be available. For minikube, this is as simple as enabling an addon. With K3s, we get this for free
    • To offer a migration path, there is now a config vfsDeprecated to which create_instance.sh will automatically switch if it detects no local storage provisioner is available. A warning is produced to users showing how they can migrate.
  • Removed some of the /shared mounts in the pods. Once vfsDeprecated is removed, this can go in all the pods
  • Removed the frontend backtrace code as I don't think this is terribly useful and just adds more code to maintain. If we want to keep this, we can add this logic to the postRun
  • Removed tailing of the rmcd logs in the system tests as this relies on the hacky /mnt/logs part from init_pod.sh and it only pollutes the test output
  • Removed client_zero_length_copy.sh test as this is not used anywhere
  • Removed the VERBOSE option from the stress test scripts

Other:

  • Fixed tiny bug where the CI job dependencies of the stress test were incorrect
  • Improved some naming consistencies here and there

Checklist

  • Documentation reflects the changes made.
  • Merge Request title is clear, concise, and suitable as a changelog entry. See this link

References

Closes #1191 (closed) Closes #1197 (closed)

Edited by Niels Alexander Buegel

Merge request reports

Loading