[CI] Add Helm support for SSD-based stress tests and clean up charts
Description
- This MR introduces a number of
values.yamlfiles for Helm to use for stress tests when thefast-ssdstorage class is available. - Allow the user to create extra K8s objects in the values file.
- Added the option to add a custom
preRunandpostRunscript to the frontend and taped pods. This ensures we can e.g. keep the pod alive for debugging in a dev environment, but that this is not a hard constraint - Removed the
init_pod.shscript. This script did 3 things:- mount logs for each container to a specific directory on a shared log volume -> this was only necessary for the stress test so that fluentd can consume the logs from one place. This scenario is now covered by doing this in a
preRunsection; that way we don't pollute the "normal" runs with this and don't bake it into the Docker image. - Fix some XRootD-related DNS issues -> no longer necessary, so removed
- Redirect core dumps to
/var/log/tmp-> already part of the minikube CI setup
- mount logs for each container to a specific directory on a shared log volume -> this was only necessary for the stress test so that fluentd can consume the logs from one place. This scenario is now covered by doing this in a
- Removed sqlite from the Helm setup as this would not work anyway
- Instead of the VFS scheduler relying on
/shared, it now relies on a shared persistent volume. Note that for this, a local-path provisioner should be available. For minikube, this is as simple as enabling an addon. With K3s, we get this for free- To offer a migration path, there is now a config
vfsDeprecatedto whichcreate_instance.shwill automatically switch if it detects no local storage provisioner is available. A warning is produced to users showing how they can migrate.
- To offer a migration path, there is now a config
- Removed some of the
/sharedmounts in the pods. OncevfsDeprecatedis removed, this can go in all the pods - Removed the frontend backtrace code as I don't think this is terribly useful and just adds more code to maintain. If we want to keep this, we can add this logic to the
postRun - Removed tailing of the rmcd logs in the system tests as this relies on the hacky
/mnt/logspart frominit_pod.shand it only pollutes the test output - Removed
client_zero_length_copy.shtest as this is not used anywhere - Removed the VERBOSE option from the stress test scripts
Other:
- Fixed tiny bug where the CI job dependencies of the stress test were incorrect
- Improved some naming consistencies here and there
Checklist
-
Documentation reflects the changes made. -
Merge Request title is clear, concise, and suitable as a changelog entry. See this link
References
Closes #1191 (closed) Closes #1197 (closed)
Edited by Niels Alexander Buegel