Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
abstract:darwin:runjobs:queues [2023-07-10 08:38] – frey | abstract:darwin:runjobs:queues [2025-04-01 12:08] (current) – [Maximum Requestable Memory] bdeng | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== The job queues (partitions) on DARWIN ====== | ||
+ | |||
+ | The DARWIN cluster has several partitions (queues) available to specify when running jobs. These partitions correspond to the various node types available in the cluster: | ||
+ | |||
+ | ^Partition Name^Description^Node Names^ | ||
+ | |standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)|r1n00 - r1n47| | ||
+ | |large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)|r2l00 - r2l10| | ||
+ | |xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)|r2x00 - r2x10| | ||
+ | |extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)|r2e00| | ||
+ | |gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)|r1t00 - r1t07, r2t08| | ||
+ | |gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)|r2v00 - r2v02| | ||
+ | |gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)|r2m00| | ||
+ | |gpu-mi100|Contains the single AMD Radeon Instinct MI100 GPU node (64 cores, 512 GiB memory, 1 MI100 GPU)|r2m01| | ||
+ | |idle|Contains all nodes in the cluster, jobs on this partition can be preempted but are not charged against your allocation| | | ||
+ | |||
+ | ===== Requirements for all partitions ===== | ||
+ | |||
+ | All partitions on DARWIN have two requirements for submitting jobs: | ||
+ | - You must set an allocation workgroup prior to submitting a job by using the **workgroup** command (e.g., '' | ||
+ | - You must explicitly request a single partition in your job submission using '' | ||
+ | |||
+ | ===== Defaults and limits for all partitions ===== | ||
+ | |||
+ | All partitions on DARWIN except '' | ||
+ | * Default run time of 30 minutes | ||
+ | * Default resources of 1 node, 1 CPU, and 1 GiB memory | ||
+ | * Default **no** preemption | ||
+ | |||
+ | All partitions on DARWIN except '' | ||
+ | * Maximum run time of 7 days | ||
+ | * Maximum of 400 jobs per user per partition | ||
+ | |||
+ | The '' | ||
+ | * **Preemption is enabled for all jobs** | ||
+ | * Maximum of 320 jobs per user | ||
+ | * Maximum of 640 CPUs per user (across all jobs in the partition) | ||
+ | |||
+ | ==== Maximum Requestable Memory ==== | ||
+ | |||
+ | Each type of node (and thus, partition) has a limited amount of memory available for jobs. A small amount of memory must be subtracted from the nominal size listed in the table above for the node's operating system and Slurm. | ||
+ | |||
+ | ^Partition Name^Maximum (by node)^Maximum (by core)^ | ||
+ | |standard|''< | ||
+ | |large-mem|''< | ||
+ | |xlarge-mem|''< | ||
+ | |extended-mem|''< | ||
+ | |gpu-t4|''< | ||
+ | |gpu-v100|''< | ||
+ | |gpu-mi50|''< | ||
+ | |gpu-mi100|''< | ||
+ | |||
+ | Please see details for [[abstract: | ||
+ | ===== The extended-mem partition ===== | ||
+ | |||
+ | Because access to the swap cannot be limited via Slurm, the '' | ||
+ | |||
+ | ===== The GPU partitions ===== | ||
+ | |||
+ | Jobs that will run in one of the GPU partitions must request GPU resources using ONE of the following flags: | ||
+ | |||
+ | ^Flag^Description^ | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |||
+ | If you do not specify one of these flags, your job will not be permitted to run in the GPU partitions. | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | ===== The idle partition ===== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | Jobs in the '' | ||
+ | |||
+ | Jobs that execute in the '' | ||
+ | |||
+ | ==== Requesting a specific resource type in the idle partition ==== | ||
+ | |||
+ | Since the '' | ||
+ | |||
+ | ^Type^Description^ | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |||
+ | To request a specific GPU type while using the '' |