From 69da24ef537e934219e93d6fe6562d5d328e8b2b Mon Sep 17 00:00:00 2001 From: Maksim Melnik Storetvedt <maksim.melnik.storetvedt@cern.ch> Date: Wed, 29 Jan 2025 16:38:41 +0100 Subject: [PATCH] Add doc on cgroups v2 --- docs/site/cgroupsv2.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 docs/site/cgroupsv2.md diff --git a/docs/site/cgroupsv2.md b/docs/site/cgroupsv2.md new file mode 100644 index 0000000..2409e0e --- /dev/null +++ b/docs/site/cgroupsv2.md @@ -0,0 +1,21 @@ +Enabling cgroups v2 + + +The JAliEn job-pilot can use cgroups v2 to better box-in each job, preventing misbehaving jobs from overusing resources and interrupting other payloads. Support for this feature will depend on OS/distribution and LRMS, but generally require: + +* **EL9** +and +* **HTCondor 23.1+** +or +* **Slurm 22.05+*** + +*Slurm requires a workaround which involves having whole-node scheduling, and enabling *lingering* on the WNs. This can be done by running `touch /var/lib/systemd/linger/$USER`, where $USER is to be replaced with the user associated with ALICE, e.g. `aliprod`. + +**NOTE**: EL9 will delegate resource controllers for *memory* and *pids* by default, but **not** for *cpu*, *cpuset* and *io*. In order for JAliEn to access these, the following must be added to the file `/etc/systemd/system/user@.service.d/delegate.conf`: + +``` +[Service] +Delegate=cpu cpuset io memory pids +``` + +Followed by a reboot. \ No newline at end of file -- GitLab