[Meta ticket] Helm CI integration Overview

added Continuous Integration Needs Discussion prioritymedium typefeature labels

mentioned in issue #665 (closed)

changed the description

assigned to @nbugel

removed Needs Discussion label

mentioned in issue #753 (closed)

The biggest first step is to clean up create_instance.sh, delete_instance.sh, and run_systemtest.sh

During clean-up, the biggest action points are as follows:

Stop relying on search/replace and doing this properly with Helm values or --set flags where necessary
Removing the kubectl exec commands and doing this via the Helm chart as well. These commands are not reliable; if Kubernetes restarts a pod everything will break as these exec commands are not executed on restart. Where possible, we should try to clean up the init scripts as much as possible.
Ensure we rely as little as possible on the environment in which the setup is run. This means we shouldn't expect particular files in particular directories. Ideally, the only thing we should rely on is a small collection of secrets needed to access e.g. the registry. This will make the setup more portable

Once this is done, it should be a lot easier to integrate our chart with EOS or dCache charts

As I reported yesterday we have various configuration files in CI that are pulled from various places and not yet coming from configmaps: they should all come from configmaps.

Helm should generate these from values and a base file that can be included by helm too: helm should filter out the configuration entries from the example file for the ones that are populated from values.

This will allow us to consume the same config file for rpm inclusion (as example) and in CI.

eos side is OK but there is nothing for the CTA side for now. We need to be able to set the CTA instance name and other things from helm and all other parameters.

Now that we are using helm it should be easy to define how many tape servers we want to run: there is no reason to keep the count as a static 2. Increasing this would be interesting in the context of stress tests.

Maybe turning database and schedDb in separate helm charts would be the way for simplification?

Helm parameters should allow us to better control the kubernetes resources we instantiate independently and remove all scripting needs there.

Calling these CTAcatalog and CTAschedDb charts needed before init chart is started would enforce our naming conventions and separate concerns.

So here is an overview of the things we want in our Helm setup:

Must have

#901 (closed) Move the generation of the *.conf files in the startup scripts to config maps and allow the user to pass these config maps.
#890 (closed) Make the number of tape servers configurable
#888 (closed) A separate chart for the Catalogue DB (this only does something in case the postgres catalogue is used)
#900 (closed) A separate chart for the Scheduler DB (only for configmaps)

Should have

Reduce the number of kubectl exec commands to ensure we can smoothly redeploy/upgrade charts
Allow the EOS Helm charts to be used in our setup
Allow the dCache Helm charts to be used in our setup
In general, reduce the amount of required scripting as much as possible
Stop relying on sed and overwrite any required values using --set (with sed it's not trivial to redeploy things and it makes things difficult to read)
Improved logging in the scripts (e.g. create_instance.sh) to ensure this is more readable (it has been improved moderately; not yet up to a standard)

Nice to have

Integrate the monitoring Helm chart with the Helm charts in the CTA repo
#898 (closed) Simplify the create_instance.sh script so that it doesn't do as many things as once (things such as running particular tests should be separate)
Improve readability of the run_systemtest.sh script
Add proper labels to each pod
Be very explicit about what external resources create_instance.sh might be relying on by passing them as parameters throughout the entire chain (so also in run_systemtest.sh)

We can (and probably should) shift around what belongs in what category, so feel free to do that or add more things

#890 Make the number of tape servers configurable

As a summary of !654 (comment 8547540):

On the install or upgrade(?) of the charts, the currently virtual hardware resources available should be scanned and loaded into the cluster to be consumed via lookup from the templating system. And the user should just need to specify the number of tape servers with the amount of tape drives per server to be deployed with error if user request exceeds available resources.

In the future we could go for fancier stuff like configuring multiple libraries. This allows to reproduce production setups which would be useful to easily test the supply logic scripts and probably other things.

To follow up a little bit on my last comment. This is not something to do in this refactor, just as context, to be taken into account for the way se set up tape server related things.

The new mhVTL RPMs we are generating via the pipelines in https://gitlab.cern.ch/cta/cta-dependencies do not come with any configuration. The installation of the RPM expects some config files to already exist in /etc/mvhtl (this is on the host machine, not the cluster). This is because we already provide them in https://gitlab.cern.ch/cta/minikube_cta_ci and they get placed in the config directory before RPM installation in the .

I closed #177 (closed) as we should not be coupling together the RPM with a specific library configuration. That is a problem to solve somewhere in between the creation of the machine and the installation of helm charts. As of now, not sure how this would play out for changing tape virtual hardware deployment on a cluster recreation.

Actually I think atm the refactor already "solves" this issue (correct me if I'm wrong). Essentially, the tape configuration can either be manually provided to the Helm installation or it can be automatically generated (similar to how the minikube setup generates it atm); both in the form of a values.yaml file. This gives the flexibility of providing it yourself, but it also does not require the user to do additional setup beforehand if they don't want to.

This is an example output of the create_instance script:

Library configuration not provided. Auto-generating...
---
tpsrv:
  library:
    type: "mhvtl"
    name: "VLSTK10"
    device: "sg3"
    drivenames:
      - "VDSTK01"
      - "VDSTK02"
      - "VDSTK03"
    drivedevices:
      - "nst0"
      - "nst1"
      - "nst2"
    tapes:
      - "V00101"
      - "V00102"
      - "V00103"
      - "V00104"
      - "V00105"
      - "V00106"
      - "V00107"
      - "V00108"
---
Using library device: sg3

Of course this still assumes the tape servers use only a single library configuration (i.e. we can't have multiple tape servers each with different configurations), but improving that is for later.

So what do we mean by Integrate with the Helm charts of EOS, is this about integrating their k8s deployment [1] into ours ? Or extending and refactoring our ctaeos ? The former could have the advantage (to be confirmed) that there should be already an easy option to scale the number of FSTs and mount them to different disks on the host node, which is what we need for scaling tests later as well.

[1] https://gitlab.cern.ch/eos/eos-on-k8s/

This is about integrating their Helm charts into ours so that we indeed don't need to manage our own ctaeos anymore. I've reworded it now to be a bit more clear

Great, Thank you !

I have checked the boxes that the new Helm setup in !660 (merged) covers. The MR is still WIP (need to iron out a few wrinkles) but the vast majority of what I want to do is there. I will try to get all the CI tests to pass properly today and then tomorrow I will work on the procedure of doing the catalogue upgrade in this dev setup (not yet with the production clone).

The only thing I foresee I might still have to do is make all the subcharts in the CTA chart top-level charts so that it is easy to individually redeploy them.

Btw, I know it's a (too) large MR, but given the deadline I had to do quite a few things fast and did not have time to wait for individual MR merges. Anyway, I provided a description of what I did and would be happy to sit down and explain everything if someone is interested. In general, I could do a presentation during a dev at some point to highlight the new structure of everything; it should be a lot more understandable now that everything has been split up in a (hopefully) better way.

Any remarks on the MR are welcome; I won't merge it yet until we (I) have ironed out the wrinkles, so that will probably take a little while still.

Hi Niels, Thanks for all this work, I am keen to sit down with you after CHEP to go through this and learn what was done.

changed the description

Alright MR is ready for review: !660 (merged). It should pass all the tests, but I will link specific pipelines below to show that it passes both the normal pipeline and the no-Oracle pipeline.

@poliverc @afonso can I assign you to the MR? Again, I know it's a large MR; if you have any questions or if you want to sit down and walk through it together, please let me know.

I also revamped the documentation, I'd advise to read through that first to get an idea of how things work now: https://gitlab.cern.ch/cta/CTA/-/blob/892-migrate-remaining-items-to-helm/continuousintegration/orchestration/README.md

I'm thinking a bit about how I did the tape server scalability and can't help but feeling that the current approach with the range loop is rather hacky and just manually doing what Kubernetes should be doing natively with e.g. deployments (which would also be more intuitive and easier to maintain/extend). I have a few questions though to clarify:

Is there always a 1:1 ratio between the number of taped and rmcd processes? I.e. does every taped process need a corresponding rmcd process?
In our setup is a tape server physically connected to its (and only its) drive(s) somehow? In other words, does it matter where the taped/rmcd processes run (my guess is that the answer is yes, but don't know the details). The reason I am asking has to do with the concept of "having a single tape server be responsible for might drives". I want to know whether we would have a "pod responsible for multiple drives" (because that would require spawning additional containers within a pod) or whether we just have that each pod can be responsible for a single drive (in which case pure replication is enough to manage all drives)

So the new solution is a lot more elegant:

When spawning the CTA chart, you provide a library configuration containing the library device, drive names, tapes etc. (this is not different from before)
The tape servers are now handled by a statefulset. This ensures we have a stable DNS entry for each of them and we can map them easily to drives. Each tpsrv pod will automatically get the name tpsrv-xx.
Based on the drives provided in the library configuration, a configmap will be generated for each drive.
For each drive, we will then have a replica in the statefulset responsible for that particular drive. It will do so by mounting the configmap(s) of the drive corresponding to its index.

This essentially accomplishes the same thing as before, but it uses Kubernetes as it is meant to be used, makes it easy to redeploy tape servers, and it is trivial to change which drives should be used now (simply update the library configuration you pass in).

Is there always a 1:1 ratio between the number of taped and rmcd processes? I.e. does every tapedprocess need a correspondingrmcd process? No, one rmcd per tape server. A tape server needs: 1 rmcd and [1-*] taped processes.
In our setup is a tape server physically connected to its (and only its) drive(s) somehow? [...] Physically, hey are connected via QLogic HBA card: Drive <-> optic fibre <-> PCIe card with optic fibre ports. For the virtual setup we don't care that much. The taped process must be able to connect to the rmcd, if you look into the cta-taped.conf.example file you can see that taped needs the rmc port to contact the rmc daemon (rmcd).
"having a single tape server be responsible for might drives" Imo, if we don't want to shuffle things too much 1 pod must follow the same logic as tapeserver: 1 rmcd container, 1 or many tape servers. As long as they are connecting to the same rmc should be enough, although I will think about this again next week.

Okay thanks for clearing that up. Then the only limitation of the current implementation is that a tape server pod always has 1 rmcd process and 1 taped process.

To solve this we should probably be able to spawn multiple taped containers in each tpsrv pod, but then the mapping of drives to pods gets a bit more complex. I.e. how do we cleanly specify/determine which drives a pod should be responsible for?

We could do this automatically if we say that all tpsrv pods get the same number of taped processes (minus some sort remainder for a given set of pods). For example, 7 drives available and I want 2 taped processes per pod -> 4 pods, 3 of which will have 2 taped processes and 1 of them will have 1 taped process. That would work, but I'm not sure whether this is a valid constraint to have...

The more flexible alternative to this is to force the user to input a configuration with a list of drive sets. Each set of drives is then assigned to a particular pod (and 1 rmcd + x taped processes are within said pod, where x depends on how many drives were assigned).

I think this configuration is the way to. I'm working on that now and should actually simplify things a bit and make them more flexible and configurable (even allow things like multiple library devices to be used).

marked this issue as related to #863

marked this issue as related to #960 (closed)

marked this issue as related to #967 (closed)

marked this issue as related to #933 (closed)

Just for reference, below is a list of items that we should implement at one point or another to have a nice consistent setup. This is in no particular order, although the EOS chart has priority:

#967 (closed) Integration with EOS chart
#960 (closed) Move from pod definitions to deployments/statefulsets
#1008 (closed) Improve naming consistency. Prefix all cta-related pods with cta- (ensure consistency with EOS chart and allows for easy distinction between cta pods and other pods running in the same namespace)
#933 (closed) Mount the init scripts as volumes in the relevant pods instead of baking them into the docker image
#1009 (closed) Move all of the cta/ subcharts into their own top-level charts. We keep cta as an umbrella chart, but the subcomponents can now also exist as their own charts
#1007 (closed) Remove split between registry and repository when specifying image

Less important

#1011 (closed) Define proper readiness probes instead of relying on *_READY files
#1012 (closed) Move the permission/ownership modifications of keytabs into init containers
Get rid of the init_pod.sh script:
- Remove claimLogs volume. This requires an update to the monitoring chart in the stress test repo.
- Move the setting of the kernel proc pattern to a Daemonset
- Change the way in which the reverse DNS fix is done for xrootd
Stop running pods in privileged mode wherever possible. Requires the removal of the init_pod.sh script

After the EOS chart is complete, the dCache integration can also start.

Most of the important things I wanted to do are done (for now). Some small items remain, but I'll take care of these at some point in the future

Next big thing here will be the integration with dCache: #304

[Meta ticket] Helm CI integration Overview

Problem to solve

High Level Targets // Wishlist

Notes

Designs

Child items 0

Activity

Must have

Should have

Nice to have

Less important

[Meta ticket] Helm CI integration Overview

Problem to solve

High Level Targets // Wishlist

Notes

Relates to

Activity

Must have

Should have

Nice to have

Less important