diff --git a/docs/site/cgroupsv2.md b/docs/site/cgroupsv2.md new file mode 100644 index 0000000000000000000000000000000000000000..2409e0e0875eed561183b9138e440b559ac75430 --- /dev/null +++ b/docs/site/cgroupsv2.md @@ -0,0 +1,21 @@ +Enabling cgroups v2 + + +The JAliEn job-pilot can use cgroups v2 to better box-in each job, preventing misbehaving jobs from overusing resources and interrupting other payloads. Support for this feature will depend on OS/distribution and LRMS, but generally require: + +* **EL9** +and +* **HTCondor 23.1+** +or +* **Slurm 22.05+*** + +*Slurm requires a workaround which involves having whole-node scheduling, and enabling *lingering* on the WNs. This can be done by running `touch /var/lib/systemd/linger/$USER`, where $USER is to be replaced with the user associated with ALICE, e.g. `aliprod`. + +**NOTE**: EL9 will delegate resource controllers for *memory* and *pids* by default, but **not** for *cpu*, *cpuset* and *io*. In order for JAliEn to access these, the following must be added to the file `/etc/systemd/system/user@.service.d/delegate.conf`: + +``` +[Service] +Delegate=cpu cpuset io memory pids +``` + +Followed by a reboot. \ No newline at end of file