Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| abstract:darwin:runjobs:queues [2021-08-27 17:42] – pdw | abstract:darwin:runjobs:queues [2025-04-01 12:08] (current) – [Maximum Requestable Memory] bdeng | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== The job queues (partitions) on DARWIN ====== | ||
| + | |||
| + | The DARWIN cluster has several partitions (queues) available to specify when running jobs. These partitions correspond to the various node types available in the cluster: | ||
| + | |||
| + | ^Partition Name^Description^Node Names^ | ||
| + | |standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)|r1n00 - r1n47| | ||
| + | |large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)|r2l00 - r2l10| | ||
| + | |xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)|r2x00 - r2x10| | ||
| + | |extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)|r2e00| | ||
| + | |gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)|r1t00 - r1t07, r2t08| | ||
| + | |gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)|r2v00 - r2v02| | ||
| + | |gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)|r2m00| | ||
| + | |gpu-mi100|Contains the single AMD Radeon Instinct MI100 GPU node (64 cores, 512 GiB memory, 1 MI100 GPU)|r2m01| | ||
| + | |idle|Contains all nodes in the cluster, jobs on this partition can be preempted but are not charged against your allocation| | | ||
| + | |||
| + | ===== Requirements for all partitions ===== | ||
| + | |||
| + | All partitions on DARWIN have two requirements for submitting jobs: | ||
| + | - You must set an allocation workgroup prior to submitting a job by using the **workgroup** command (e.g., '' | ||
| + | - You must explicitly request a single partition in your job submission using '' | ||
| + | |||
| + | ===== Defaults and limits for all partitions ===== | ||
| + | |||
| + | All partitions on DARWIN except '' | ||
| + | * Default run time of 30 minutes | ||
| + | * Default resources of 1 node, 1 CPU, and 1 GiB memory | ||
| + | * Default **no** preemption | ||
| + | |||
| + | All partitions on DARWIN except '' | ||
| + | * Maximum run time of 7 days | ||
| + | * Maximum of 400 jobs per user per partition | ||
| + | |||
| + | The '' | ||
| + | * **Preemption is enabled for all jobs** | ||
| + | * Maximum of 320 jobs per user | ||
| + | * Maximum of 640 CPUs per user (across all jobs in the partition) | ||
| + | |||
| + | ==== Maximum Requestable Memory ==== | ||
| + | |||
| + | Each type of node (and thus, partition) has a limited amount of memory available for jobs. A small amount of memory must be subtracted from the nominal size listed in the table above for the node's operating system and Slurm. | ||
| + | |||
| + | ^Partition Name^Maximum (by node)^Maximum (by core)^ | ||
| + | |standard|''< | ||
| + | |large-mem|''< | ||
| + | |xlarge-mem|''< | ||
| + | |extended-mem|''< | ||
| + | |gpu-t4|''< | ||
| + | |gpu-v100|''< | ||
| + | |gpu-mi50|''< | ||
| + | |gpu-mi100|''< | ||
| + | |||
| + | Please see details for [[abstract: | ||
| + | ===== The extended-mem partition ===== | ||
| + | |||
| + | Because access to the swap cannot be limited via Slurm, the '' | ||
| + | |||
| + | ===== The GPU partitions ===== | ||
| + | |||
| + | Jobs that will run in one of the GPU partitions must request GPU resources using ONE of the following flags: | ||
| + | |||
| + | ^Flag^Description^ | ||
| + | |'' | ||
| + | |'' | ||
| + | |'' | ||
| + | |'' | ||
| + | |||
| + | If you do not specify one of these flags, your job will not be permitted to run in the GPU partitions. | ||
| + | |||
| + | <note warning> | ||
| + | |||
| + | ===== The idle partition ===== | ||
| + | |||
| + | The '' | ||
| + | |||
| + | <note warning> | ||
| + | |||
| + | Jobs in the '' | ||
| + | |||
| + | Jobs that execute in the '' | ||
| + | |||
| + | ==== Requesting a specific resource type in the idle partition ==== | ||
| + | |||
| + | Since the '' | ||
| + | |||
| + | ^Type^Description^ | ||
| + | |'' | ||
| + | |'' | ||
| + | |'' | ||
| + | |||
| + | To request a specific GPU type while using the '' | ||