Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| abstract:caviness:runjobs:queues [2019-06-17 18:33] – [The standard partition] anita | abstract:caviness:runjobs:queues [2023-05-30 13:48] (current) – [The job queues (partitions) on Caviness] anita | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | |||
| + | ====== The job queues (partitions) on Caviness ====== | ||
| + | |||
| + | The Caviness cluster has several kinds of partition (queue) available in which to run jobs: | ||
| + | |||
| + | ^Kind^Description^Nodes^ | ||
| + | |standard|The default partition if no '' | ||
| + | |devel|A partition with very short runtime limits and small resource limits; important to use for any development using compilers|'' | ||
| + | |workgroup-specific|Partitions associated with specific kinds of compute equipment in the cluster purchased by a research group <<// | ||
| + | |||
| + | ===== The standard partition ===== | ||
| + | |||
| + | This partition is the default when no '' | ||
| + | |||
| + | The idea of the standard partition is somewhat like the combination of the standby and spillover queues concepts in the earlier clusters. | ||
| + | |||
| + | Limits to jobs submitted to this partition are: | ||
| + | * a maximum runtime of 7 days (default is 30 minutes) | ||
| + | * Maximum number of CPUs per job = 360 | ||
| + | * Maximum CPUs per user = 720 | ||
| + | |||
| + | The standard partition is subject to job preemption (killed) because it allows a job submitted to a workgroup-specific partition to release resources tied-up by jobs in the standard partition. In summary, jobs in the standard partition will be preempted (killed with 5 minute grace period) to release resources for the workgroup-specific partition job. For more information on how to handle your job if it is preempted, please refer to [[abstract: | ||
| + | |||
| + | ===== The devel partition ===== | ||
| + | |||
| + | This partition is used for short-lived jobs with minimal resource needs. | ||
| + | * Performing compiles of code for projects that otherwise can't be done on the login (head) node and to make sure you are allocated a compute node with the development tools, libraries, etc. which are needed for compilers. | ||
| + | * Running test jobs to vet programs or changes to programs | ||
| + | * Testing correctness of program parallelization | ||
| + | * Interactive sessions | ||
| + | * Removing files especially if cleaning up many files and directories in '' | ||
| + | Because performance is not critical for these use cases, the nodes serviced by the '' | ||
| + | |||
| + | Limits to jobs submitted to this partition are: | ||
| + | * a maximum runtime of 2 hours (default is 30 minutes) | ||
| + | * each user can submit up to 2 jobs | ||
| + | * each job can use up to 4 cores on a single node | ||
| + | |||
| + | For example: | ||
| + | <code bash> | ||
| + | [traine@login01 ~]$ workgroup -g it_css | ||
| + | [(it_css: | ||
| + | Mon Jul 23 15:25:07 EDT 2018 | ||
| + | </ | ||
| + | |||
| + | One copy of the '' | ||
| + | <code bash> | ||
| + | [traine@login01 ~]$ workgroup -g it_css | ||
| + | [(it_css: | ||
| + | salloc: Granted job allocation 940 | ||
| + | salloc: Waiting for resource configuration | ||
| + | salloc: Nodes r00n56 are ready for job | ||
| + | [traine@r00n56 ~]$ echo $SLURM_CPUS_ON_NODE | ||
| + | 2 | ||
| + | </ | ||
| + | |||
| + | ===== The workgroup-specific partitions ===== | ||
| + | |||
| + | The use of // | ||
| + | |||
| + | Limits to jobs submitted to workgroup-specific partitions: | ||
| + | * a maximum runtime of 7 days (default is 30 minutes) | ||
| + | * per-workgroup resource limits (QOS) based on | ||
| + | * how many nodes your research group (workgroup) purchased (node=#) | ||
| + | * how many cores your research group (workgroup) purchased (cpu=#) | ||
| + | * how many GPUs your research group (workgroup) purchased (gres/ | ||
| + | |||
| + | For example: | ||
| + | |||
| + | <code bash> | ||
| + | $ workgroup -g it_nss | ||
| + | $ sbatch --verbose --partition=_workgroup_ … | ||
| + | : | ||
| + | sbatch: partition | ||
| + | : | ||
| + | Submitted batch job 1234 | ||
| + | $ scontrol show job 1234 | egrep -i ' | ||
| + | | ||
| + | | ||
| + | </ | ||
| + | |||
| + | Job 1234 is billed against the it_nss account because it is in the it_nss workgroup partition. | ||
| + | |||
| + | To check what your workgroup has access to and the guaranteed resources on the Caviness refer to [[abstract: | ||