Update collectd alarms - freeze, oldtmp, notupdated
All threads resolved!
All threads resolved!
Compare changes
+ 39
− 10
@@ -19,30 +19,59 @@ Last update for [lxsoft](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/
@@ -19,30 +19,59 @@ Last update for [lxsoft](https://gitlab.cern.ch/ai/it-puppet-hostgroup-lxsoft/-/
This alarm is triggered when the `.8-latest` (or `.s8-latest`) symlink hasn't been updated in over 30 hours.
every day. If it hasn't been updated, chances are something is wrong with the [centos8_snapshots](https://gitlab.cern.ch/linuxsupport/cronjobs/centos8_snapshots) or [stream8_snapshots](https://gitlab.cern.ch/linuxsupport/cronjobs/stream8_snapshots)
Nomad job. Check the logs in the ES dashboard for [CS8](https://es-linux.cern.ch/kibana/app/dashboards?security_tenant=internal#/view/6b8102c0-51cb-11eb-932f-51687e53f66a) or [CS9](https://es-linux.cern.ch/kibana/app/dashboards?security_tenant=internal#/view/25061790-0bf6-11ed-8484-13abadf100a6).
Before deleting _anything_, make sure you know what you're doing. If in doubt, double-check with the rest of the team.
This alarm indicates that there are old `.tmp.*` directories in `/mnt/data1/dist/cern/centos/*-snapshots/`.
Those directories are created when the snapshot is run, but they are renamed at the end of the process.
If there are directories left over, it means something interrupted that day's snapshot and needs to be investigated.
If the snapshots are currently failing, don't delete today's `.tmp.*` snapshot, and **never** delete the `.*-latest` symlink
This alarm indicates an issue with the named repo, as indicated by `/usr/bin/repoquery --repofrompath=<repoid>,<repopath> -qa`.