Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
software:matlab:farber [2019-08-29 14:34] – [Matlab on Farber] anita | software:matlab:farber [2021-04-27 16:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Matlab on Farber ====== | ||
+ | |||
+ | For use on Farber, MATLAB projects should be developed using a Desktop installation of MATLAB and then copied to Farber | ||
+ | to be run in batch. | ||
+ | |||
+ | Details on how to run these two scripts in batch are given with the resulting output files. | ||
+ | section with UNIX commands you can use to watch your jobs and gather [[# | ||
+ | It is important to know how much memory with be needed and how many cores will be used to set your resource requirements. If you do not ask for enough memory your job will fail. If you do not ask for enough cores, the job will take longer. | ||
+ | |||
+ | Even though it easier to develop on a desktop, MATLAB can be run interactively on Farber. | ||
+ | Two interactive jobs are demonstrated. | ||
+ | second example shows an interactive session, which starts multiple MATLAB pool of workers to execute the function as a parallel toolbox loop, **'' | ||
+ | |||
+ | You can run [[#desktop |MATLAB as a desktop (GUI)]] application on Farber, but is not recommended as the graphics is slow to display especially with a slower network connection. | ||
+ | |||
+ | Many MATLAB research projects fall in the the "high throughput computing" | ||
+ | Thus we have a | ||
+ | final example that gives the recommended workflow to scale your job to multiple nodes. Compile the MATLAB code with single thread option and deploy the job as an grid engine array job. | ||
+ | |||
+ | <note important> | ||
+ | The MATLAB distributed computing server (MDCS) is not installed on Farber. | ||
+ | an array job of compiled MATLAB is recommended for large jobs. | ||
+ | </ | ||
+ | |||
+ | ===== Matlab License Information for Grid Engine ===== | ||
+ | |||
+ | Matlab licenses are pushed into consumable (global, per-job) integer complexes in Grid Engine and can be checked using | ||
+ | |||
+ | < | ||
+ | qhost -h global -F | ||
+ | </ | ||
+ | |||
+ | to list number of unused license seats for each product. | ||
+ | |||
+ | Below is an example representing a snapshot of unused licensed seats for Matlab products on the cluster. | ||
+ | < | ||
+ | [traine@mills ~]$ qhost -h global -F | ||
+ | HOSTNAME | ||
+ | ------------------------------------------------------------------------------- | ||
+ | global | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | gc: | ||
+ | </ | ||
+ | |||
+ | Matlab jobs can be submitted to require a certain number of license seats to be available before a job will run. If there are inter-license dependencies for toolboxes, then you should specify all the licenses including Matlab and/or Simulink. | ||
+ | |||
+ | For example, if a Matlab job requires the Financial toolbox, then you will also need to specify all the inter-related toolbox licenses required by the Financial toolbox such as the Statistics and Optimization toolboxes as well Matlab itself. See [[http:// | ||
+ | ]] for complete details. | ||
+ | |||
+ | < | ||
+ | qsub -l MLM.MATLAB=1, | ||
+ | </ | ||
+ | |||
+ | Naturally, this isn't a to-the-moment mapping because the license server is not being queried constantly. | ||
+ | |||
+ | This will be most helpful when submitting many Matlab jobs that require a toolbox with a low-seat count. They will wait for a toolbox seat to become available rather than trying to run and having many getting the " | ||
+ | |||
+ | ===== Matlab function ===== | ||
+ | |||
+ | We will using this sample function on the Farber cluster in multiple demonstrations | ||
+ | <file matlab maxEig.m> | ||
+ | function maxe = maxEig(sd, | ||
+ | % maxEig | ||
+ | % Input parameters | ||
+ | % sd - seed for random generator | ||
+ | % dim - size of the square matrix | ||
+ | % | ||
+ | % maxe - maximum real eigenvalue | ||
+ | if (isdeployed) | ||
+ | sd = str2num(sd) | ||
+ | dim = str2num(dim) | ||
+ | end | ||
+ | |||
+ | rng(sd); | ||
+ | ev = eig( randn(dim) ); | ||
+ | maxe = max( ev(imag(ev)==0) ) | ||
+ | end | ||
+ | </ | ||
+ | |||
+ | |||
+ | The page will use this MATLAB function to illustrate using Matlab in batch and interactively. | ||
+ | |||
+ | Finally it will be compiled and deployed using the Matlab Compiler Runtime (MCR) environment. | ||
+ | |||
+ | <note important> | ||
+ | We want to select on the real eigenvalues to compute the maximum. | ||
+ | </ | ||
+ | |||
+ | <note tip>The last line of this function does not have a semicolon. | ||
+ | <code matlab> | ||
+ | maxe = max(ev(imag(ev)==0)); | ||
+ | fprintf(' | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Matlab script ==== | ||
+ | First, write a Matlab script file. It should have a comment on the first line describing the purpose of the script and have the '' | ||
+ | |||
+ | <file matlab script.m> | ||
+ | % script to run maxEig function 200 times and print average. | ||
+ | |||
+ | count = 200; | ||
+ | dim = 5001; | ||
+ | sumMaxe = 0; | ||
+ | tic; | ||
+ | for i=1:count; | ||
+ | sumMaxe = sumMaxe + maxEig(i, | ||
+ | end; | ||
+ | toc | ||
+ | avgMaxEig = sumMaxe/ | ||
+ | |||
+ | quit | ||
+ | </ | ||
+ | |||
+ | This is a detailed script example, which calls the maxEig function. | ||
+ | |||
+ | <note tip> | ||
+ | Several MATLAB commands could be added to the beginning of this script to set the maximum number of computational threads to the number of slots assigned to your job. If the scheduler using CGROUPS to limit your job core count, then these commands are not necessary. | ||
+ | < | ||
+ | [compThreads, | ||
+ | if count == 1 | ||
+ | warning(' | ||
+ | autoCompThreads = maxNumCompThreads(compThreads); | ||
+ | disp(sprintf(' | ||
+ | end | ||
+ | </ | ||
+ | See [[maxNumCompThreadsGridEngine|Setting maximum number of computational threads]]</ | ||
+ | |||
+ | <note tip> | ||
+ | This script ends in a **__quit__** command (equivalent to MATLAB **__exit__**). | ||
+ | terminates MATLAB when done. If you run this from the bash command line with the '' | ||
+ | |||
+ | Without the **__quit__** you will come back to the MATLAB prompt on completion for a interactive job. If this is the last line of a batch queue script, then the only difference will be the MATLAB prompt ''>>'' | ||
+ | </ | ||
+ | |||
+ | ===== Copy the project folder ===== | ||
+ | |||
+ | Copy the project folder to a directory on the cluster. | ||
+ | Use any [[: | ||
+ | |||
+ | ====== Batch job====== | ||
+ | |||
+ | You should have a copy of your MATLAB [[#project directory]] on the cluster. | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | MATLAB has a new version twice a year. It is important to keep the version you use on your desktop the same as the | ||
+ | one on the cluster. | ||
+ | < | ||
+ | vpkg_versions matlab | ||
+ | </ | ||
+ | will show you the versions available on a cluster. | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | It is frequently advisable to keep your MATLAB project clean from non-MATLAB files such as the queue | ||
+ | script file and the script output file. But you may combine them, and even use the MATLAB editor to | ||
+ | create the script file and look at the output file. | ||
+ | If you create the file on a PC, take care to not transfer the files as binary. See Transfer Files for the appropriate cluster. | ||
+ | |||
+ | When you have one combined directory, do not put the '' | ||
+ | to the project directory using '' | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===== Create a job script file ===== | ||
+ | You should create a job script file to submit a batch job. Start by modifying a job template file (''/ | ||
+ | In your copy change the commented '' | ||
+ | require MATLAB, and then add your shell commands to the end of the file. Your copy may contain the lines: | ||
+ | < | ||
+ | # Add vpkg_require commands after this line: | ||
+ | vpkg_require matlab | ||
+ | |||
+ | # Now append all of your shell commands necessary to run your program | ||
+ | # after this line: | ||
+ | cd project_directory | ||
+ | matlab -nodisplay -singleCompThread -r main_script | ||
+ | </ | ||
+ | The '' | ||
+ | one line **'' | ||
+ | |||
+ | ===== Submit batch job ===== | ||
+ | Your shell must be in a [[abstract: | ||
+ | to submit any jobs. | ||
+ | Use the '' | ||
+ | and note the ''<< | ||
+ | submit the job with: | ||
+ | < | ||
+ | qsub matlab_first.qs | ||
+ | </ | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | This is the message you get if you are not in workgroup. | ||
+ | </ | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | It is true that a queue script file is (usually) a bash script, but it must be executed with the '' | ||
+ | </ | ||
+ | |||
+ | ===== Wait for job to complete ===== | ||
+ | You can [[abstract: | ||
+ | For example, to list the information for job ''<< | ||
+ | < | ||
+ | qstat -j << | ||
+ | </ | ||
+ | |||
+ | For long running jobs, you could change your queue script to notify you via an e-mail message when the job is | ||
+ | complete. | ||
+ | ===== Post process job ===== | ||
+ | All MATLAB output data files will be in the project directory, but the MATLAB standard output will be in | ||
+ | the current directory, from which you submitted the job. Look for a file ending in your assigned JOBID. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ====== Interactive job ====== | ||
+ | |||
+ | Here are specific details for running MATLAB as an interactive job on a compute node. You should have a copy of your [[#MATLAB project directory]] on the cluster and will be referred to a '' | ||
+ | |||
+ | ===== Command-line ===== | ||
+ | |||
+ | You should work on a compute node when in command-line MATLAB. | ||
+ | Your shell must be in a [[abstract: | ||
+ | to submit a single threaded interactive job using '' | ||
+ | |||
+ | < | ||
+ | qlogin | ||
+ | vpkg_require matlab | ||
+ | cd project_directory | ||
+ | matlab -nodesktop -singleCompThread | ||
+ | </ | ||
+ | |||
+ | This will start a interactive command-line session in your terminal window. | ||
+ | |||
+ | ===== Desktop ===== | ||
+ | |||
+ | You should be on a compute node before you start MATLAB. | ||
+ | To start a MATLAB desktop (GUI mode) on a cluster, you must be running an X11 server and you must have | ||
+ | [[abstract: | ||
+ | X11 tunneling]]. | ||
+ | |||
+ | Your shell must be in a [[abstract: | ||
+ | to submit a job using '' | ||
+ | |||
+ | < | ||
+ | qlogin -l exclusive=1 | ||
+ | vpkg_require matlab | ||
+ | cd project_directory | ||
+ | matlab | ||
+ | </ | ||
+ | |||
+ | This will start a interactive DESKTOP session on you X11 screen. | ||
+ | |||
+ | |||
+ | See [[software: | ||
+ | |||
+ | ====== Compiling with MATLAB ====== | ||
+ | |||
+ | We show the three most common ways to work with compilers when using MATLAB. | ||
+ | |||
+ | - Compiling your matlab code to run in the MCR (Matlab Compiler Runtime) | ||
+ | - Compiling your C or Fortran program to call MATLAB engine. | ||
+ | - Compiling your own function in C or Fortran to be used in a MATLAB session. | ||
+ | |||
+ | < | ||
+ | < | ||
+ | Warning: You are using gcc version ' | ||
+ | with MEX is ' | ||
+ | http:// | ||
+ | </ | ||
+ | But the compilation completes successfully. | ||
+ | </ | ||
+ | |||
+ | ===== Compiling your MATLAB ===== | ||
+ | |||
+ | There is an example MCR project in the ''/ | ||
+ | |||
+ | ==== Copy dev-projects template ==== | ||
+ | |||
+ | On the head node | ||
+ | < | ||
+ | cp -r / | ||
+ | cd MCR | ||
+ | </ | ||
+ | |||
+ | ==== Compile with make ==== | ||
+ | |||
+ | Now compile on the compute node by using | ||
+ | |||
+ | < | ||
+ | qlogin | ||
+ | make | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | Resulting output from the make command: | ||
+ | < | ||
+ | Adding package `mcr/ | ||
+ | make[1]: Entering directory `/ | ||
+ | mcc -o maxEig -R " | ||
+ | Compiler version: 5.2 (R2014b) | ||
+ | Dependency analysis by REQUIREMENTS. | ||
+ | Parsing file "/ | ||
+ | (Referenced from: " | ||
+ | Deleting 0 temporary MEX authorization files. | ||
+ | Generating file "/ | ||
+ | Generating file " | ||
+ | make[1]: Leaving directory `/ | ||
+ | </ | ||
+ | Take note of the package added, and the files that are generated. | ||
+ | You must add the package in your batch script or to test interactively. | ||
+ | |||
+ | ==== test interactively ==== | ||
+ | |||
+ | To test interactively on the same compute node. | ||
+ | < | ||
+ | vpkg_require mcr/ | ||
+ | time ./maxEig 20.8 | ||
+ | </ | ||
+ | <note tip>This example is designed as a test for batch computing, and takes about 15 minutes to complete. If you | ||
+ | change the MATLAB statement dim=10000 to dim=1000, and recompile, it will take about 10 seconds</ | ||
+ | |||
+ | ==== back to the head node ==== | ||
+ | When done, exit the compute node. | ||
+ | < | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | ==== Copy array job example ==== | ||
+ | |||
+ | < | ||
+ | cp / | ||
+ | vi matlab-mcr.qs | ||
+ | diff / | ||
+ | </ | ||
+ | The '' | ||
+ | < | ||
+ | 46c46 | ||
+ | < # -l m_mem_free=5G | ||
+ | --- | ||
+ | > #$ -l m_mem_free=3G | ||
+ | 51c51 | ||
+ | < # -t 1-4 | ||
+ | --- | ||
+ | > #$ -t 1-100 | ||
+ | 63c63,64 | ||
+ | < vpkg_require mcr/ | ||
+ | --- | ||
+ | > vpkg_require mcr/ | ||
+ | > let lambda=" | ||
+ | 79c80 | ||
+ | < MCR_EXECUTABLE_FLAGS=(" | ||
+ | --- | ||
+ | > MCR_EXECUTABLE_FLAGS=(" | ||
+ | </ | ||
+ | To submit a standby array job that has 100 tasks. | ||
+ | < | ||
+ | qsub -l standby=1 matlab-mcr.qs | ||
+ | </ | ||
+ | |||
+ | Example | ||
+ | < | ||
+ | [(it_css: | ||
+ | Your job-array 627074.1-100: | ||
+ | [(it_css: | ||
+ | Mon Apr 11 14:56:26 EDT 2016 | ||
+ | [(it_css: | ||
+ | Mon Apr 11 15:17:33 EDT 2016 | ||
+ | [(it_css: | ||
+ | 100 | ||
+ | </ | ||
+ | There are 100 output files with the names matlab-mcr.qs.o627074.1 to matlab-mcr.qs.o627074.100 | ||
+ | For example file 50: | ||
+ | < | ||
+ | [CGROUPS] UD Grid Engine cgroup setup commencing | ||
+ | [CGROUPS] Setting 3221225472 bytes (vmem none bytes) on n106 (master) | ||
+ | [CGROUPS] | ||
+ | [CGROUPS] done. | ||
+ | |||
+ | Adding package `mcr/ | ||
+ | GridEngine parameters: | ||
+ | MCR_ROOT = / | ||
+ | MCR executable = / | ||
+ | flags = 49 | ||
+ | MCR_CACHE_ROOT = / | ||
+ | -- begin maxEig run -- | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | -- end maxEig run -- | ||
+ | </ | ||
+ | |||
+ | |||
+ | [[more examples]] | ||
+ | |||
+ | |||
+ | ===== Compiling your code to use MATLAB engine ====== | ||
+ | |||
+ | There is an simple example function '' | ||
+ | |||
+ | On the head node and in a workgroup shell: | ||
+ | |||
+ | < | ||
+ | vpkg_require matlab/ | ||
+ | cp $MATLABROOT/ | ||
+ | export LD_LIBRARY_PATH=$MATLABROOT/ | ||
+ | mex -client engine fengdemo.F | ||
+ | </ | ||
+ | |||
+ | To start MATLAB on a compute node to test this new program: | ||
+ | |||
+ | < | ||
+ | qlogin | ||
+ | vpkg_require matlab/ | ||
+ | export LD_LIBRARY_PATH=$MATLABROOT/ | ||
+ | ./fengdemo | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | Step one of the fengdemo should give the plot: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Step two should give the table: | ||
+ | |||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ===== Compiling your own MATLAB function ====== | ||
+ | |||
+ | There is an simple example function '' | ||
+ | |||
+ | On the head node and in a workgroup shell: | ||
+ | |||
+ | < | ||
+ | vpkg_require matlab/ | ||
+ | cp $MATLABROOT/ | ||
+ | mex timestwo.c | ||
+ | </ | ||
+ | |||
+ | To start MATLAB on a compute node to test this new function: | ||
+ | |||
+ | < | ||
+ | qlogin | ||
+ | vpkg_require matlab/ | ||
+ | matlab -nodesktop | ||
+ | timestwo(4) | ||
+ | quit | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | You should get the answer | ||
+ | |||
+ | < | ||
+ | >> timestwo(4) | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 8 | ||
+ | |||
+ | >> | ||
+ | </ | ||
+ | |||
+ | ====== Batch job serial example ====== | ||
+ | |||
+ | Second, write a shell script file to set the Matlab environment and start Matlab running your script file. The following script file will set the Matlab environment and run the command in the [[# | ||
+ | |||
+ | <file bash batch.qs> | ||
+ | #$ -N script.m | ||
+ | #$ -m eas | ||
+ | #$ -M traine@gmail.com | ||
+ | #$ -l exclusive=1 | ||
+ | |||
+ | vpkg_require matlab/ | ||
+ | matlab -nodisplay -nojvm -r script | ||
+ | </ | ||
+ | |||
+ | Ths '' | ||
+ | The '' | ||
+ | |||
+ | <note important> | ||
+ | The '' | ||
+ | < | ||
+ | #$ -pe threads 5 | ||
+ | #$ -l m_mem_free=1G | ||
+ | </ | ||
+ | If everyone in your group carefully set these values, multiply jobs can run concurrently on the node. | ||
+ | |||
+ | See [[maxNumCompThreadsGridEngine|Setting maximum number of computational threads]] | ||
+ | |||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | The command '' | ||
+ | the compound command | ||
+ | < | ||
+ | "try; script; catch err; disp(getReport(err,' | ||
+ | </ | ||
+ | The purpose of the **'' | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | * Do not include the '' | ||
+ | * Do set paper dimensions and print each figure to a file. | ||
+ | |||
+ | The text output will be included in the standard Grid Engine output file, but not any graphics. | ||
+ | |||
+ | We suggest setting the current figure' | ||
+ | |||
+ | <code matlab> | ||
+ | set(gcf,' | ||
+ | print(' | ||
+ | </ | ||
+ | |||
+ | will set the current figure to be 4 x 3 inches with no margins, and then print the figure as a 400x300 resolution '' | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Submit job ==== | ||
+ | Third, from the directory with both '' | ||
+ | |||
+ | < | ||
+ | qsub batch.qs | ||
+ | </ | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | In this example you will only need a license for the base Matlab, and the parallel toolbox needs one license. | ||
+ | |||
+ | **Toolbox dependencies** | ||
+ | |||
+ | You should include toolbox dependencies in your batch script too to help avoid a failure, which will occur if the job starts with no [[matlab# | ||
+ | |||
+ | For example, the Bioinformatics toolbox only has one seat, and in addition it requires the Statistics and Machine Learning toolbox, as well as the core MATLAB. | ||
+ | < | ||
+ | #$ -l MLM.MATLAB=1, | ||
+ | </ | ||
+ | |||
+ | to your job script. | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Wait for completion ==== | ||
+ | Finally, wait for the mail notification, | ||
+ | |||
+ | After waiting for about 2 1/2 hours, a message was receive with subject line "Grid Engine Job Scheduler": | ||
+ | |||
+ | < | ||
+ | Job 2362 (script.m) Complete | ||
+ | User = traine | ||
+ | Queue = it_css.q@n038 | ||
+ | Host = n038.farber.hpc.udel.edu | ||
+ | Start Time = 10/21/2014 14: | ||
+ | End Time = 10/21/2014 17: | ||
+ | User Time = 12:41:56 | ||
+ | System Time = 00:11:31 | ||
+ | Wallclock Time = 02:23:42 | ||
+ | CPU = 12:53:27 | ||
+ | Max vmem = 3.924G | ||
+ | Exit Status | ||
+ | </ | ||
+ | |||
+ | ==== Gather results ==== | ||
+ | The results for Job 2362 are in the file | ||
+ | <file text script.m.o2362 > | ||
+ | [CGROUPS] No / | ||
+ | [CGROUPS] UD Grid Engine cgroup setup commencing | ||
+ | [CGROUPS] Setting none bytes (vmem none bytes) on n038 (master) | ||
+ | [CGROUPS] | ||
+ | [CGROUPS] done. | ||
+ | |||
+ | Adding package `matlab/ | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | Copyright 1984-2014 The MathWorks, Inc. | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | To get started, type one of these: helpwin, helpdesk, or demo. | ||
+ | For product information, | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Elapsed time is 8618.393954 seconds. | ||
+ | |||
+ | avgMaxEig = | ||
+ | |||
+ | | ||
+ | </ | ||
+ | |||
+ | ==== Timings and core count ==== | ||
+ | |||
+ | Consider a batch job run with the two Grid Engine options: | ||
+ | < | ||
+ | -pe threads 5 | ||
+ | -l m_mem_free=1G | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | < | ||
+ | $ ssh $n ps -eo pid, | ||
+ | PID RUSER %CPU %MEM THCNT STIME TIME COMMAND | ||
+ | 29207 traine | ||
+ | </ | ||
+ | This '' | ||
+ | |||
+ | Given the reported PID, 29207, you can drill down and see which of the 10 threads are consuming CPU time: | ||
+ | < | ||
+ | $ ssh $n ps -eLf | egrep ' | ||
+ | UID PID PPID | ||
+ | traine | ||
+ | traine | ||
+ | traine | ||
+ | traine | ||
+ | traine | ||
+ | </ | ||
+ | |||
+ | While the batch job was running on node '' | ||
+ | every second | ||
+ | |||
+ | <code text> | ||
+ | $ ssh $n top -H -b -n 1 | egrep ' | ||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 29257 traine | ||
+ | 29266 traine | ||
+ | 29264 traine | ||
+ | 29265 traine | ||
+ | 29267 traine | ||
+ | 29263 traine | ||
+ | </ | ||
+ | |||
+ | using the the PID of | ||
+ | < | ||
+ | $ ssh $n mpstat -P ALL 1 2 | ||
+ | Linux 2.6.32-431.23.3.el6.x86_64 (n038) 04/28/2015 _x86_64_ (20 CPU) | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | Average: | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | qhost -h $n | ||
+ | HOSTNAME | ||
+ | ---------------------------------------------------------------------------------------------- | ||
+ | global | ||
+ | n038 lx-amd64 | ||
+ | </ | ||
+ | |||
+ | After the job is done you can use '' | ||
+ | < | ||
+ | $ qacct -h n038 -j 64501 | egrep ' | ||
+ | ru_wallclock 9088.920 | ||
+ | ru_maxrss | ||
+ | cpu 18986.828 | ||
+ | maxvmem | ||
+ | </ | ||
+ | |||
+ | |||
+ | ====== Batch parallel example ====== | ||
+ | |||
+ | The Matlab parallel toolbox uses JVM to manage the workers and communicate while you are running. | ||
+ | need to setup the Matlab pools in your '' | ||
+ | |||
+ | ==== Matlab parallel script ==== | ||
+ | Here are the slightly modified MATLAB script. | ||
+ | |||
+ | Add two '' | ||
+ | <file text pscript.m> | ||
+ | % script to run maxEig function 200 times | ||
+ | mypool=parpool(20); | ||
+ | |||
+ | count = 200; | ||
+ | dim = 5001; | ||
+ | sumMaxe = 0; | ||
+ | tic; | ||
+ | parfor i=1:count; | ||
+ | sumMaxe = sumMaxe + maxEig(i, | ||
+ | end; | ||
+ | toc | ||
+ | avgMaxEig = sumMaxe/ | ||
+ | |||
+ | delete(mypool); | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | ==== Grid engine parallel script ==== | ||
+ | Take out '' | ||
+ | <file bash pbatch.qs> | ||
+ | #$ -N pscript | ||
+ | #$ -m eas | ||
+ | #$ -M traine@gmail.com | ||
+ | #$ -l m_mem_free=3.1G | ||
+ | #$ -l MLM.Distrib_Computing_Toolbox=1 | ||
+ | #$ -pe threads 20 | ||
+ | |||
+ | vpkg_require matlab/ | ||
+ | matlab -nodisplay -r pscript | ||
+ | </ | ||
+ | |||
+ | ==== Timing results ==== | ||
+ | Reported usage for same job run using the parallel toolbox. | ||
+ | < | ||
+ | JJob 618746 (pscript) Complete | ||
+ | User = traine | ||
+ | Queue = spillover.q@n010 | ||
+ | Host = n010.farber.hpc.udel.edu | ||
+ | Start Time = 03/31/2016 11: | ||
+ | End Time = 03/31/2016 11: | ||
+ | User Time = 06:02:34 | ||
+ | System Time = 00:01:00 | ||
+ | Wallclock Time = 00:19:35 | ||
+ | CPU = 06:03:35 | ||
+ | Max vmem = 80.513G | ||
+ | Exit Status | ||
+ | </ | ||
+ | |||
+ | Compare script vs pscript | ||
+ | |||
+ | ^ Job ^ Wallclock Time ^ CPU ^ Max vmem ^ | ||
+ | | script | 02:23:42 | 12:53:27 | 3.924G | | ||
+ | | pscript | 00:19:35 | 04:28:57 | 80.513G | | ||
+ | |||
+ | The job **script** used more CPU resources with the multiple computational threads, while **pscript** user more memory resources with 20 single-threaded worker. | ||
+ | |||
+ | |||
+ | |||
+ | ====== Interactive example ====== | ||
+ | |||
+ | The basic steps to running a [[: | ||
+ | |||
+ | This demo starts in your MATLAB directory and with and active workgroup. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Scheduling exclusive interactive job ==== | ||
+ | |||
+ | < | ||
+ | $ qlogin -l exclusive=1 | ||
+ | Your job 2493 (" | ||
+ | waiting for interactive job to be scheduled ... | ||
+ | Your interactive job 2493 has been successfully scheduled. | ||
+ | Establishing / | ||
+ | </ | ||
+ | |||
+ | ==== Starting a command mode matlab session ==== | ||
+ | |||
+ | < | ||
+ | $ vpkg_require matlab/ | ||
+ | Adding package `matlab/ | ||
+ | </ | ||
+ | < | ||
+ | $ matlab -nodesktop -nosplash | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | To get started, type one of these: helpwin, helpdesk, or demo. | ||
+ | For product information, | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Using help as the first command ==== | ||
+ | |||
+ | < | ||
+ | >> help maxEig | ||
+ | | ||
+ | Input parameters | ||
+ | sd - seed for uniform random generator | ||
+ | dim - size of the square matrix (should be odd) | ||
+ | Output value | ||
+ | maxe - maximum real eigvalue | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ==== Calling function once ==== | ||
+ | |||
+ | Use the tic and toc commands to report the elapsed time to generate the random matrix, find all eigenvalues and report the maximum real eigenvalue. | ||
+ | |||
+ | < | ||
+ | >> tic; maxEig(1, | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Elapsed time is 54.781289 seconds. | ||
+ | </ | ||
+ | |||
+ | ==== Finishing up ==== | ||
+ | |||
+ | < | ||
+ | >> exit | ||
+ | $ exit | ||
+ | Connection to n036 closed. | ||
+ | / | ||
+ | </ | ||
+ | ===== Interactive parallel toolbox example ===== | ||
+ | |||
+ | When you plan to use the parallel toolbox, you should logon exclusively to a compute node with the command: | ||
+ | |||
+ | qlogin -l exclusive=1 | ||
+ | | ||
+ | This will effectively reserve the entire node for your MATLAB workers. | ||
+ | |||
+ | Here we start 20 workers with the parpool function, and then use parfor to send a different seed to each worker. | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | It took about 100 seconds for all 20 workers to produce on result. Since they are working in parallel the elapsed time to complete 200 results is about | ||
+ | |||
+ | < | ||
+ | >> parpool(20); | ||
+ | Starting parallel pool (parpool) using the ' | ||
+ | >> tic; parfor sd = 1:200; maxEig(sd, | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | </ | ||
+ | |||
+ | |||
+ | skipped lines | ||
+ | < | ||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Elapsed time is 1087.729851 seconds. | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ====== MCR array job example ====== | ||
+ | |||
+ | Most Matlab functions can be compiled using the Matlab Compiler (MCC) and then deployed to run on the compute nodes in the MATLAB Compiler Runtime (MCR). | ||
+ | |||
+ | There are two ways to run compiled MATLAB jobs in a shared environment, | ||
+ | - Compile to produce and executable that uses a single computational thread - MATLAB option ' | ||
+ | - Submit the job to use the nodes exclusively - Grid engine option '' | ||
+ | |||
+ | You can run more jobs on each node when they compiled to use just one core (Single Comp Thread). | ||
+ | you higher throughput for an array job, but not higher performance. | ||
+ | |||
+ | |||
+ | ==== Example compiler commands ==== | ||
+ | |||
+ | The [[# | ||
+ | < | ||
+ | if (isdeployed) | ||
+ | sd = str2num(sd) | ||
+ | dim = str2num(dim) | ||
+ | end | ||
+ | </ | ||
+ | All augments of the function are taken as tokens on the shell command used to execute the script, | ||
+ | and they are all strings. | ||
+ | that the rest of the script will behave the same when deployed or executed directly in Matlab. | ||
+ | |||
+ | You can convert this function into a single computational executable by using the Matlab compiler '' | ||
+ | < | ||
+ | prog=maxEig | ||
+ | opt=' | ||
+ | version=' | ||
+ | |||
+ | vpkg_require matlab/ | ||
+ | mcc -R " | ||
+ | |||
+ | [ -d $WORKDIR/ | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | <note tip> | ||
+ | options you want to use at run time. The '' | ||
+ | using MCR. The '' | ||
+ | |||
+ | <note warning> | ||
+ | < | ||
+ | [ -d $WORKDIR/ | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | ==== Compiling commands ==== | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | Your job 619145 (" | ||
+ | waiting for interactive job to be scheduled ... | ||
+ | Your interactive job 619145 has been successfully scheduled. | ||
+ | Establishing / | ||
+ | [(it_css: | ||
+ | Adding package `matlab/ | ||
+ | Compiler version: 6.2 (R2016a) | ||
+ | Dependency analysis by REQUIREMENTS. | ||
+ | Parsing file "/ | ||
+ | (Referenced from: " | ||
+ | Deleting 0 temporary MEX authorization files. | ||
+ | Generating file "/ | ||
+ | Generating file " | ||
+ | [(it_css: | ||
+ | compile.sh | ||
+ | maxEig | ||
+ | maxEig.m | ||
+ | [(it_css: | ||
+ | exit | ||
+ | Connection to n039 closed. | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Example queue script file ==== | ||
+ | |||
+ | The '' | ||
+ | / | ||
+ | or modify this simple example: | ||
+ | < | ||
+ | #$ -N maxEig | ||
+ | #$ -t 1-200 | ||
+ | #$ -l m_mem_free=3.1G | ||
+ | # | ||
+ | # Parameter sweep array job to run the maxEig compiled MATLAB function with | ||
+ | # lambda = 1,2. ... 200 | ||
+ | # | ||
+ | date " | ||
+ | echo "Host $HOSTNAME" | ||
+ | |||
+ | vpkg_require mcr/ | ||
+ | export MCR_CACHE_ROOT=" | ||
+ | |||
+ | let seed=$SGE_TASK_ID | ||
+ | let dim=5001 | ||
+ | |||
+ | ./maxEig $seed $dim | ||
+ | |||
+ | date " | ||
+ | </ | ||
+ | |||
+ | The two '' | ||
+ | |||
+ | ==== Compiled Matlab in owner queues ==== | ||
+ | |||
+ | To test the example compiled Matlab job on the '' | ||
+ | then submited with qsub. The job number assigned 3731. After a few minutes 200 files were created in the current directory. | ||
+ | maxEig.o3731.1 | ||
+ | They each had the output of one task. For example for taskid 125: | ||
+ | < | ||
+ | [CGROUPS] UD Grid Engine cgroup setup commencing | ||
+ | [CGROUPS] Setting 5368709120 bytes (vmem 5368709120 bytes) on n036 (master) | ||
+ | [CGROUPS] | ||
+ | [CGROUPS] done. | ||
+ | |||
+ | Start 1414171807 | ||
+ | Host n036 | ||
+ | Adding package `mcr/ | ||
+ | |||
+ | sd = | ||
+ | 125 | ||
+ | |||
+ | dim = | ||
+ | 5001 | ||
+ | | ||
+ | maxe = | ||
+ | | ||
+ | |||
+ | Finish 1414171902 | ||
+ | </ | ||
+ | Now we gather all the information from this files and write a data file with three columns: | ||
+ | < | ||
+ | sd dim maxe | ||
+ | 1 5001 70.0220 | ||
+ | 2 5001 71.7546 | ||
+ | 3 5001 70.8331 | ||
+ | 4 5001 70.5714 | ||
+ | |||
+ | .... | ||
+ | |||
+ | 199 5001 70.7535 | ||
+ | 200 5001 67.4221 | ||
+ | </ | ||
+ | and prints the average | ||
+ | avgMaxEig = 69.5131125 | ||
+ | These are the same results we got from both the matlab loop and the parallel toolbox, but they where computed | ||
+ | in just over 3 1/2 minutes. | ||
+ | |||
+ | === SGE array job started Fri 24 Oct 2014 01:28:37 PM EDT === | ||
+ | |||
+ | Used a total of 18977 CPU seconds over 219 seconds of elapsed time | ||
+ | on 10 nodes | ||
+ | ^ Node ^^ Real Clock Time ^^^ | ||
+ | ^ Name ^ | ||
+ | | n036| 24| | ||
+ | | n038| 24| | ||
+ | | n040| 24| | ||
+ | | n084| 24| | ||
+ | | n085| 19| | ||
+ | | n086| 19| | ||
+ | | n089| | ||
+ | | n090| 24| | ||
+ | | n092| 19| | ||
+ | |test-gpu| | ||
+ | |||
+ | Using gnuplot we get a time chart of usage on the 10 nodes and total CPU usage. | ||
+ | |||
+ | |||
+ | {{: | ||
+ | |||
+ | ==== Compiled Matlab in standby queue ==== | ||
+ | Command to submit 200 jobs to the standby queue (must complete in 8 hours.) | ||
+ | < | ||
+ | |||
+ | === SGE array job started Wed 22 Oct 2014 01:16:58 PM EDT === | ||
+ | |||
+ | Used a total of 19856 CPU seconds over 127 seconds of elapsed time on 17 nodes | ||
+ | ^ Node ^^ Real Clock Time ^^^ | ||
+ | ^ Name ^ | ||
+ | | n000| 12| | ||
+ | | n003| 12| | ||
+ | | n004| 12| | ||
+ | | n019| 12| | ||
+ | | n022| 12| | ||
+ | | n031| 12| | ||
+ | | n032| 12| | ||
+ | | n036| 12| | ||
+ | | n045| 12| | ||
+ | | n051| | ||
+ | | n064| 12| | ||
+ | | n073| 12| | ||
+ | | n074| 12| | ||
+ | | n077| 12| | ||
+ | | n080| 12| | ||
+ | | n083| 12| | ||
+ | | n088| 12| | ||
+ | |||
+ | {{: | ||
+ | |||