Revisions to Slurm Configuration v1.0.7 on DARWIN
This document summarizes alterations to the Slurm job scheduler configuration on the DARWIN cluster.
Issues
See this document discussing swap limits in Slurm jobs.
Implementation
- The
dynamic_swap_limitsSPANK plugin will be compiled and installed - The
plugstack.confconfiguration file will be modified to require thedynamic_swap_limitsplugin
The following aspects of the system construction will be observed in the configuration of the plugin:
- In all cases, the active swap present on the node will be used as the
max_swapvalue - Most nodes consist of 64 cores; 1/64 = 1.5625%, so by default jobs will be granted 1% of
max_swapper CPU - The V100 nodes consist of 48 cores; 1/48 = 2.0833%, so by default jobs will be granted 1.5% of
max_swapper CPU - The Mi100 node consists of 128 cores; 1/128 = 0.78125%, so by default jobs will be granted 0.25% of
max_swapper CPU - Jobs running under the extended-mem partition on the extended memory node will have no swap limits enforced (jobs are scheduled user-exclusive on this node, so any failures will not impact other users' jobs)
These details produce the following configuration string:
partition(extended-mem)=none,host(r2v[00-02])=1.5%/cpu,host(r0m01)=0.25%/cpu,default()=1%/cpu
Impact
No downtime is expected. The slurmd daemon must be restarted on all compute nodes, but currently-executing jobs/job steps should not be affected (they will reconnect to the new slurmd as necessary to communicate job status, etc.). The slurmctld daemons do not use the SPANK plugin, thus they do not need to be restarted.
Timeline
| Date | Time | Goal/Description |
|---|---|---|
| 2021-11-19 | Authoring of this document | |
| 2021-11-24 | 09:00 | Implementation |
| 2021-12-01 | 14:54 | Update: appropriate aggregate limits on job cgroup |