Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
abstract:darwin:runjobs:queues [2021-04-27 15:19] – [The job queues (partitions) on DARWIN] anita | abstract:darwin:runjobs:queues [2025-04-01 12:08] (current) – [Maximum Requestable Memory] bdeng | ||
---|---|---|---|
Line 4: | Line 4: | ||
The DARWIN cluster has several partitions (queues) available to specify when running jobs. These partitions correspond to the various node types available in the cluster: | The DARWIN cluster has several partitions (queues) available to specify when running jobs. These partitions correspond to the various node types available in the cluster: | ||
- | ^Partition Name^Description^ | + | ^Partition Name^Description^Node Names^ |
- | |standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)| | + | |standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)|r1n00 - r1n47| |
- | |large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)| | + | |large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)|r2l00 - r2l10| |
- | |xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)| | + | |xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)|r2x00 - r2x10| |
- | |extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)| | + | |extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)|r2e00| |
- | |gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)| | + | |gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)|r1t00 - r1t07, r2t08| |
- | |gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)| | + | |gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)|r2v00 - r2v02| |
- | |gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)| | + | |gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)|r2m00| |
+ | |gpu-mi100|Contains the single AMD Radeon Instinct MI100 GPU node (64 cores, 512 GiB memory, 1 MI100 GPU)|r2m01| | ||
+ | |idle|Contains all nodes in the cluster, jobs on this partition can be preempted but are not charged against your allocation| | ||
===== Requirements for all partitions ===== | ===== Requirements for all partitions ===== | ||
Line 21: | Line 23: | ||
===== Defaults and limits for all partitions ===== | ===== Defaults and limits for all partitions ===== | ||
- | All partitions on DARWIN have the following defaults: | + | All partitions on DARWIN |
* Default run time of 30 minutes | * Default run time of 30 minutes | ||
* Default resources of 1 node, 1 CPU, and 1 GiB memory | * Default resources of 1 node, 1 CPU, and 1 GiB memory | ||
* Default **no** preemption | * Default **no** preemption | ||
- | All partitions on DARWIN have the following limits: | + | All partitions on DARWIN |
- | * Maximum run time of 2 days | + | * Maximum run time of 7 days |
* Maximum of 400 jobs per user per partition | * Maximum of 400 jobs per user per partition | ||
+ | The '' | ||
+ | * **Preemption is enabled for all jobs** | ||
+ | * Maximum of 320 jobs per user | ||
+ | * Maximum of 640 CPUs per user (across all jobs in the partition) | ||
+ | |||
+ | ==== Maximum Requestable Memory ==== | ||
+ | |||
+ | Each type of node (and thus, partition) has a limited amount of memory available for jobs. A small amount of memory must be subtracted from the nominal size listed in the table above for the node's operating system and Slurm. | ||
+ | |||
+ | ^Partition Name^Maximum (by node)^Maximum (by core)^ | ||
+ | |standard|''< | ||
+ | |large-mem|''< | ||
+ | |xlarge-mem|''< | ||
+ | |extended-mem|''< | ||
+ | |gpu-t4|''< | ||
+ | |gpu-v100|''< | ||
+ | |gpu-mi50|''< | ||
+ | |gpu-mi100|''< | ||
+ | |||
+ | Please see details for [[abstract: | ||
===== The extended-mem partition ===== | ===== The extended-mem partition ===== | ||
- | Because access to the swap cannot be limited via Slurm, the extended-mem partition is configured to run all jobs in exclusive user mode. This means only a single user can be on the node at a time, but that user can run one or more jobs on the node. All jobs on the node will have access to the full amount of swap available, so care must be taken in usage of swap when running multiple jobs. | + | Because access to the swap cannot be limited via Slurm, the '' |
===== The GPU partitions ===== | ===== The GPU partitions ===== | ||
- | Jobs that will run in one of the GPU partitions | + | Jobs that will run in one of the GPU partitions |
^Flag^Description^ | ^Flag^Description^ | ||
Line 44: | Line 66: | ||
|'' | |'' | ||
- | If you do not specify one of these flags, your job will not be allocated any GPUs. | + | If you do not specify one of these flags, your job will not be permitted to run in the GPU partitions. |
+ | |||
+ | <note warning> | ||
+ | |||
+ | ===== The idle partition ===== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | Jobs in the '' | ||
+ | |||
+ | Jobs that execute in the '' | ||
+ | |||
+ | ==== Requesting a specific resource type in the idle partition ==== | ||
+ | |||
+ | Since the '' | ||
+ | |||
+ | ^Type^Description^ | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
- | **PLEASE NOTE: | + | To request a specific GPU type while using the '' |