Differences
This shows you the differences between two versions of the page.
abstract:farber:runapps [2017-10-10 13:27] – created sraskar | abstract:farber:runapps [2017-10-10 13:42] (current) – removed sraskar | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | < | ||
- | ====== Running applications ====== | ||
- | |||
- | //This section uses the wiki's [[00_conventions|documentation conventions]].// | ||
- | ===== Runtime environment ===== | ||
- | |||
- | Generally, your runtime environment (path, environment variables, etc.) should be the same as your compile-time environment. Usually, the best way to achieve this is to put the relevant VALET commands in shell scripts. You can reuse common sets of commands by storing them in a shell script file that can be //sourced //from within other shell script files. | ||
- | |||
- | <note important> | ||
- | If you are writing an executable script that does not have the **-l** option on the **bash** command, and you want to include VALET commands in your script, then you should include the line: | ||
- | <code bash> | ||
- | source / | ||
- | </ | ||
- | You do not need this command when you | ||
- | - type commands, or source the command file, | ||
- | - include lines in the file to be submitted to the qsub. | ||
- | </ | ||
- | ===== Job scheduling system ===== | ||
- | |||
- | A job scheduling system is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, limiting resources, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities. | ||
- | |||
- | Each investing-entity' | ||
- | |||
- | The standby queues are available for projects requiring more slots than purchased, or to take advantage of idle nodes when a job would have to wait in the owner queue. | ||
- | |||
- | A spillover queue may be available for the case where a job is submitted to the owner queue, and there are standby jobs consuming needed slots. Instead of waiting, the jobs will be sent to the spillover queue to start on a similar idle node. | ||
- | |||
- | A spare queue may be on a cluster to make spare nodes available to users, by special request. | ||
- | |||
- | Each cluster is configured with a particular job scheduling system. General documentation is available for all [[: | ||
- | |||
- | ===== The job queues ===== | ||
- | |||
- | Each investing-entity on a cluster an //owner queue// that exclusively use the investing-entity' | ||
- | |||
- | There are also node-wise queues, // | ||
- | |||
- | When submitting a batch job to Grid Engine, you specify the resources you need or want for your job. **//You don't typically specify the name of the queue//**. Instead, you include a set of directives that specify your job's characteristics. Grid Engine then chooses the most appropriate queue that meets those needs. | ||
- | |||
- | The queue to which a job is assigned depends primarily on six factors: | ||
- | |||
- | * Whether the job is serial or parallel | ||
- | * Which parallel environment (e.g., mpi, threads) is needed | ||
- | * Which or how much of a resource is needed (e.g., max clock time, memory requirements) | ||
- | * Resources your job will consume (e.g. an entire node, max memory usage) | ||
- | * Whether the job is non-interactive or interactive | ||
- | |||
- | For each investing-entity, | ||
- | |||
- | ^ <<// | ||
- | ^ '' | ||
- | ^ '' | ||
- | ^ '' | ||
- | ^ '' | ||
- | |||
- | **Details by cluster** | ||
- | |||
- | * [[clusters: | ||
- | * [[clusters: | ||
- | |||
- | |||
- | |||
- | |||
- | ===== Scheduling Jobs ===== | ||
- | |||
- | In order to schedule any job (interactively or batch) on a cluster, you must set your [[general/ | ||
- | |||
- | ==== Interactive jobs (qlogin) ==== | ||
- | |||
- | All interactive jobs should be scheduled to run on the compute nodes, not the login/head node. | ||
- | |||
- | An interactive session (job) can often be made non-interactive ([[general: | ||
- | |||
- | // | ||
- | |||
- | Then the non-interactive ([[general: | ||
- | |||
- | == Starting an interactive session == | ||
- | |||
- | Remember you must specify your [[/ | ||
- | |||
- | Type | ||
- | <code bash> | ||
- | workgroup -g // | ||
- | </ | ||
- | |||
- | Type | ||
- | <code bash> | ||
- | qlogin | ||
- | </ | ||
- | |||
- | to reserve one scheduling slot and start an interactive shell on one of your workgroup // | ||
- | |||
- | Type | ||
- | <code bash> | ||
- | qlogin –pe threads 12 | ||
- | </ | ||
- | |||
- | to reserve 12 scheduling slots and start an interactive shell on one your workgroup // | ||
- | |||
- | Type | ||
- | <code bash> | ||
- | exit | ||
- | </ | ||
- | |||
- | to terminate the interactive shell and release the scheduling slot(s). | ||
- | |||
- | == Acceptable nodes for interactive sessions == | ||
- | |||
- | Use the login (head) node for interactive program development including Fortran, C, and C++ program compilation. Use Grid Engine (**qlogin**) to start interactive shells on your workgroup // | ||
- | |||
- | ==== Batch jobs (qsub) ==== | ||
- | |||
- | Grid Engine provides the **qsub** command for scheduling batch jobs: | ||
- | |||
- | ^ command ^ Action ^ | ||
- | | '' | ||
- | |||
- | For example, | ||
- | |||
- | qsub myproject.qs | ||
- | |||
- | or to submit a standby job that waits for idle nodes (up to 240 slots for 8 hours), | ||
- | |||
- | qsub -l standby=1 myproject.qs | ||
- | |||
- | or to submit a standby job that waits for idle 48-core nodes (if you are using a cluster with 48-core nodes like Mills) | ||
- | |||
- | qsub -l standby=1 -q standby.q@@48core myproject.qs | ||
- | |||
- | or to submit a standby job that waits for idle 24-core nodes, (would not be assigned to any 48-core nodes; important for consistency of core assignment) | ||
- | |||
- | qsub -l standby=1 -q standby.q@@24core myproject.qs | ||
- | |||
- | or to submit to the four hour standby queue (up to 816 slots spanning all nodes) | ||
- | |||
- | qsub -l standby=1, | ||
- | |||
- | or to submit to the four hour standby queue spanning just the 24-core nodes. | ||
- | |||
- | qsub -l standby=1, | ||
- | |||
- | This file '' | ||
- | |||
- | <note tip> | ||
- | We strongly recommend that you use a script file that you pattern after the prototypes in **/ | ||
- | |||
- | Reusable job scripts help you maintain a consistent batch environment across runs. The optional **.qs** filename suffix signifies a **q**ueue-**s**ubmission script file. | ||
- | </ | ||
- | |||
- | <note important> | ||
- | |||
- | |||
- | |||
- | === Grid Engine environment variables === | ||
- | |||
- | In every batch session, Grid Engine sets environment variables that are useful within job scripts. Here are some common examples. The rest appear in the ENVIRONMENTAL VARIABLES section of the **qsub**** man** page.//// | ||
- | |||
- | ^ Environment variable ^ Contains ^ | ||
- | | **HOSTNAME** | Name of the execution (compute) node | | ||
- | | **JOB_ID** | Batch job id assigned by Grid Engine | | ||
- | | **JOB_NAME** | Name you assigned to the batch job (See [[# | ||
- | | **NSLOTS** | Number of // | ||
- | | **SGE_TASK_ID** | Task id of an array job sub-task (See [[# | ||
- | | **TMPDIR** | Name of directory on the (compute) node scratch filesystem | | ||
- | |||
- | When Grid Engine assigns one of your job's tasks to a particular node, it creates a temporary work directory on that node's 1-2 TB local scratch disk. And when the task assigned to that node is finished, Grid Engine removes the directory and its contents. The form of the directory name is | ||
- | |||
- | **/ | ||
- | |||
- | For example after '' | ||
- | <code bash> | ||
- | echo $TMPDIR | ||
- | </ | ||
- | to see the name of the node scratch directory for this interactive job. | ||
- | < | ||
- | / | ||
- | </ | ||
- | |||
- | See [[: | ||
- | |||
- | Grid Engine uses these environment variables' | ||
- | |||
- | ^ File name patter ^ Description ^ | ||
- | | [$JOB_NAME].o[$JOB_ID] | Default **output** filename | | ||
- | | [$JOB_NAME].e[$JOB_ID] | **error** filename (when not joined to output) | | ||
- | | [$JOB_NAME].po[$JOB_ID] | Parallel job **output** output (Empty for most queues) | | ||
- | | [$JOB_NAME].pe[$JOB_ID] | Parallel job **error** filename (Usually empty) | | ||
- | |||
- | |||
- | === Command options for qsub === | ||
- | |||
- | The most commonly used **qsub** options fall into two categories: // | ||
- | |||
- | The table below lists **qsub**' | ||
- | |||
- | ^ Option / Argument ^ Function ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | ^ Special notes for IT clusters: ^^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | ^ The resource-management options for '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | For example, putting the lines | ||
- | < | ||
- | #$ -l h_cpu=1: | ||
- | #$ –pe threads 12 | ||
- | </ | ||
- | in the job script tells Grid Engine to set a hard limit of 1.5 hours on the CPU time resource for the job, and to assign 12 processors for your job. | ||
- | |||
- | Grid Engine tries to satisfy all of the resource-management options you specify in a job script or as qsub command-line options. If there is a queue already defined that accepts jobs having that particular combination of requests, Grid Engine assigns your job to that queue. | ||
- | |||
- | ==== Array jobs ==== | ||
- | |||
- | An [[: | ||
- | |||
- | <note tip> | ||
- | The '' | ||
- | | ||
- | For example, the '' | ||
- | </ | ||
- | |||
- | The general form of the **qsub** option is: | ||
- | |||
- | -t // | ||
- | |||
- | with a default step_size of 1. For these examples, the option would be: | ||
- | |||
- | -t 2-5000: | ||
- | |||
- | Additional simple how-to examples for [[http:// | ||
- | |||
- | ==== Chaining jobs ==== | ||
- | |||
- | If you have a multiple jobs where you want to automatically run other job(s) after the execution of another job, then you can use chaining. When you chain jobs, remember to check the status of the other job to determine if it successfully completed. This will prevent the system from flooding the scheduler with failed jobs. Here is a simple chaining example with three job scripts '' | ||
- | |||
- | <code - doThing1.qs> | ||
- | |||
- | #$ -N doThing1 | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | |||
- | # Now append all of your shell commands necessary to run your program | ||
- | # after this line: | ||
- | | ||
- | </ | ||
- | |||
- | <code - doThing2.qs> | ||
- | |||
- | #$ -N doThing2 | ||
- | #$ -hold_jid doThing1 | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | |||
- | # Now append all of your shell commands necessary to run your program | ||
- | # after this line: | ||
- | |||
- | # Here is where you should add a test to make sure | ||
- | # that dotask1 successfully completed before running | ||
- | # ./dotask2 | ||
- | # You might check if a specific file(s) exists that you would | ||
- | # expect after a successful dotask1 run, something like this | ||
- | # if [ -e dotask1.log ] | ||
- | # then ./dotask2 | ||
- | # fi | ||
- | # If dotask1.log does not exist it will do nothing. | ||
- | # If you don't need a test, then you would run the task. | ||
- | | ||
- | </ | ||
- | |||
- | <code - doThing3.qs> | ||
- | |||
- | #$ -N doThing3 | ||
- | #$ -hold_jid doThing2 | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | |||
- | # Now append all of your shell commands necessary to run your program | ||
- | # after this line: | ||
- | # Here is where you should add a test to make sure | ||
- | # that dotask2 successfully completed before running | ||
- | # ./dotask3 | ||
- | # You might check if a specific file(s) exists that you would | ||
- | # expect after a successful dotask2 run, something like this | ||
- | # if [ -e dotask2.log ] | ||
- | # then ./dotask3 | ||
- | # fi | ||
- | # If dotask2.log does not exist it will do nothing. | ||
- | # If you don't need a test, then just run the task. | ||
- | | ||
- | </ | ||
- | |||
- | Now submit all three job scripts. In this example, we are using account '' | ||
- | |||
- | < | ||
- | [(it_css: | ||
- | [(it_css: | ||
- | [(it_css: | ||
- | </ | ||
- | |||
- | The basic flow is '' | ||
- | |||
- | You might also want to have '' | ||
- | |||
- | <code - doThing2.qs> | ||
- | |||
- | #$ -N doThing2 | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | |||
- | # Now append all of your shell commands necessary to run your program | ||
- | # after this line: | ||
- | | ||
- | </ | ||
- | |||
- | <code - doThing3.qs> | ||
- | |||
- | #$ -N doThing3 | ||
- | #$ -hold_jid doThing1, | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | |||
- | # Now append all of your shell commands necessary to run your program | ||
- | # after this line: | ||
- | # Here is where you should add a test to make sure | ||
- | # that dotask1 and dotask2 successfully completed before running | ||
- | # ./dotask3 | ||
- | # You might check if a specific file(s) exists that you would | ||
- | # expect after a successful dotask1 and dotask2 run, something like this | ||
- | # if [ -e dotask1.log -a -e dotask2.log ]; | ||
- | # then ./dotask3 | ||
- | # fi | ||
- | # If both files do not exist it will do nothing. | ||
- | # If you don't need a test, then just run the task. | ||
- | | ||
- | </ | ||
- | |||
- | Now submit all three jobs again. However this time '' | ||
- | before running. | ||
- | |||
- | ==== Resource-management options ==== | ||
- | |||
- | Any large cluster will have many nodes with perhaps differing resources, e.g., cores, memory, disk space and accelerators. | ||
- | The ones you can request come in three categories. | ||
- | |||
- | - Fixed resources by the configuration - slots and installed memory, | ||
- | - Set by load sensor - CPU load averages, memory usage | ||
- | - Managed by job scheduler internal bookkeeping to ensure availability - available memory and floating software licenses. | ||
- | |||
- | **Details by cluster** | ||
- | |||
- | * [[clusters: | ||
- | * [[clusters: | ||
- | |||
- | ===== Managing Jobs ===== | ||
- | ==== Checking job status ==== | ||
- | |||
- | Use the **qstat** command to check the status of queued jobs. Use the '' | ||
- | |||
- | ^ Option ^ Result ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | For example, to list the information for job 62900, type | ||
- | < | ||
- | qstat -j 62900 | ||
- | </ | ||
- | |||
- | To list a table of jobs assigned to user //traine// that displays the resource requirements for each job, type | ||
- | < | ||
- | qstat -u traine -r | ||
- | </ | ||
- | |||
- | With no options **qstat** defaults to '' | ||
- | **qstat** command uses //Reduced Format// with following columns. | ||
- | |||
- | ^ Column header ^ Description ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | === A more concise listing ==== | ||
- | |||
- | The IT-supplied **qjobs** command provides a more convenient listing of job status. | ||
- | |||
- | ^ Command ^ Description ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | In all cases the JobID, Owner, State and Name are listed in a table. | ||
- | |||
- | === Job status is qw === | ||
- | |||
- | When your job status is '' | ||
- | |||
- | <code base> | ||
- | [(it_css: | ||
- | job-ID | ||
- | ----------------------------------------------------------------------------------------------------------------- | ||
- | 99154 0.50661 openmpi-pg traine | ||
- | </ | ||
- | |||
- | Sometimes your job is stuck and remains in the '' | ||
- | |||
- | <code base> | ||
- | [(it_css: | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 has no permission for cluster queue " | ||
- | Job 99154 Jobs cannot run because only 72 of 144 requested slots are available | ||
- | Job 99154 Jobs can not run in PE " | ||
- | verification: | ||
- | </ | ||
- | |||
- | In this example, we asked for 144 slots, but only 72 slots are available for workgroup '' | ||
- | |||
- | <code base> | ||
- | [(it_css: | ||
- | CLUSTER QUEUE | ||
- | it_css-dev.q | ||
- | it_css-qrsh.q | ||
- | it_css.q | ||
- | it_css.q+ | ||
- | standby-4h.q | ||
- | standby.q | ||
- | </ | ||
- | |||
- | Use **qalter** to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup '' | ||
- | |||
- | <code base> | ||
- | [(it_css: | ||
- | modified parallel environment of job 99154 | ||
- | modified slot range of job 99154 | ||
- | [(it_css: | ||
- | job-ID | ||
- | ----------------------------------------------------------------------------------------------------------------- | ||
- | 99154 0.50661 openmpi-pg traine | ||
- | </ | ||
- | |||
- | Another way to get this job running would be to change the resource for the job to run in the standby queue. | ||
- | <code base> | ||
- | [(it_css: | ||
- | modified hard resource list of job 99154 | ||
- | [(it_css: | ||
- | job-ID | ||
- | ----------------------------------------------------------------------------------------------------------------- | ||
- | 99154 0.50661 openmpi-pg traine | ||
- | </ | ||
- | |||
- | <note important>'' | ||
- | |||
- | === Job status is Eqw === | ||
- | |||
- | When your job status is '' | ||
- | |||
- | <code base> | ||
- | [(it_css: | ||
- | job-ID | ||
- | ----------------------------------------------------------------------------------------------------------------- | ||
- | | ||
- | </ | ||
- | |||
- | If the state shows '' | ||
- | |||
- | <code base> | ||
- | [traine@mills ~]$ qstat -j 686924 | grep error | ||
- | error reason | ||
- | </ | ||
- | |||
- | This error indicates that some directory or file (respectively) cannot be found. Verify that the file or directory in question exists, i.e., you haven' | ||
- | |||
- | If you understand the reason and can get it fixed, use '' | ||
- | |||
- | <code base> | ||
- | [traine@mills ~]$ qmod -cj 686924 | ||
- | </ | ||
- | |||
- | and it should eventually run. | ||
- | |||
- | ==== Checking queue status ==== | ||
- | |||
- | The **qstat** command can also be used to get status of all queues on the system. | ||
- | |||
- | ^ Option ^ Result ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | |||
- | With the '' | ||
- | |||
- | ^ Column header ^ Description ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | Examples: | ||
- | |||
- | List all queues that are unavailable because they are disabled or the slotwise preemption limits have been reached. | ||
- | <code bash> | ||
- | qstat -f -qs dP | ||
- | </ | ||
- | |||
- | List the queues associated with the investing entity //it_css//. | ||
- | <code bash> | ||
- | qstat -f | egrep ' | ||
- | </ | ||
- | |||
- | ==== Checking overall queue and node information ==== | ||
- | |||
- | You can determine overall queue and node information using the '' | ||
- | |||
- | ^ Command ^ Illustrative example ^ | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | | '' | ||
- | |||
- | ==== Checking overall usage of resource quotas ==== | ||
- | |||
- | Resource quotas are used to help control the standby and spillover queues. | ||
- | |||
- | ^ Command ^ Illustrative example ^ | ||
- | | '' | ||
- | | '' | ||
- | |||
- | The example below gives a snapshot of slots being used by '' | ||
- | |||
- | < | ||
- | $ qquota -u traine | grep standby | ||
- | standby_limits/ | ||
- | standby_cumulative/ | ||
- | $ qquota -u \* | grep it_css | ||
- | per_workgroup/ | ||
- | </ | ||
- | |||
- | <note important> | ||
- | ==== Deleting a job ==== | ||
- | |||
- | Use the **qdel** | ||
- | |||
- | For example, to delete job 28000 | ||
- | <code bash> | ||
- | qdel 28000 | ||
- | </ | ||
- | |||
- | <note important> | ||
- | |||
- | If you have a job that remains in a delete state, even after you try to delete it with the | ||
- | **qdel** command, then try a force deletion with | ||
- | <code bash> | ||
- | qdel -f 28000 | ||
- | </ | ||
- | This will just forget about the job without attempting any cleanup on the node(s) being used. | ||
- | |||
- | </ | ||