Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
software:r:farber [2017-10-23 18:05] – created sraskar | software:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== R on Farber ===== | ||
+ | |||
+ | ==== Learning R ==== | ||
+ | |||
+ | == SWIRL === | ||
+ | In addition to other resources, SWIRL is installed on the Farber cluster and is available as an interactive learning guide | ||
+ | inside R: | ||
+ | |||
+ | < | ||
+ | $ vpkg_require r/3 r-cran | ||
+ | $ R -q --no-save | ||
+ | > library(swirl) | ||
+ | > swirl() | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ==== R libraries and extensions ==== | ||
+ | |||
+ | === Installed library bundles === | ||
+ | The cluster also has the majority of [[http:// | ||
+ | and [[http:// | ||
+ | insalled. | ||
+ | respective catalogs. | ||
+ | packages based on dependencies. | ||
+ | these bundles provide access to over 6,600 R modules, pre-compiled and ready | ||
+ | for use. | ||
+ | |||
+ | ^r-cran | ||
+ | ^r-bioconductor |The full suite of[[http:// | ||
+ | ^r-fftw | ||
+ | ^r-gsl | ||
+ | ^r-gdal | ||
+ | ^r-jags | ||
+ | ^r-mpi | ||
+ | ^r-netcdf | ||
+ | ^r-all | ||
+ | ^r-cuda | ||
+ | |||
+ | === Searching for modules === | ||
+ | The HPC team provides a tool, which is loaded along with R, to help cluster | ||
+ | users locate modules in these various bundles, | ||
+ | it is called '' | ||
+ | interpreted as extended regular expressions to the UNIX | ||
+ | [[http:// | ||
+ | enabled by default). | ||
+ | |||
+ | If, for example, you are looking packages to help with a copula regression, | ||
+ | you may search for them as such: | ||
+ | |||
+ | < | ||
+ | $ r-search copula | ||
+ | R Location | ||
+ | ------- | ||
+ | R/ | ||
+ | R/ | ||
+ | R/ | ||
+ | R/ | ||
+ | R/ | ||
+ | R/ | ||
+ | R/ | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Not, it is clear that two bundles for version 3.1.1 of R contain modules | ||
+ | which may be of help. If you require the " | ||
+ | may use valet to load it into your environment via the " | ||
+ | bundle. | ||
+ | |||
+ | === Loading library bundles for use === | ||
+ | < | ||
+ | $ vpkg_require r-gsl/ | ||
+ | Adding dependency `r-cran/ | ||
+ | Adding dependency `gsl/1.16` to your environment | ||
+ | Adding dependency `glpk/4.55` to your environment | ||
+ | Adding dependency `mpfr/ | ||
+ | Adding package `r-gsl/ | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | Now using the library in R can be done as normal. | ||
+ | |||
+ | < | ||
+ | $ R --no-save -q | ||
+ | > library(CopulaRegression) | ||
+ | Loading required package: MASS | ||
+ | Loading required package: VineCopula | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | === Learning about modules === | ||
+ | IT provides a small script called '' | ||
+ | documentation of R modules. | ||
+ | a module to decide if it requires more research. | ||
+ | must be installed, and the module bundle must be loaded with '' | ||
+ | For example: | ||
+ | |||
+ | < | ||
+ | $ vpkg_require r/3.1.1 r-cran/ | ||
+ | $ r-info car | ||
+ | car-package | ||
+ | |||
+ | Companion to Applied Regression | ||
+ | |||
+ | Description: | ||
+ | |||
+ | This package accompanies Fox, J. and Weisberg, S., _An R Companion | ||
+ | to Applied Regression_, | ||
+ | |||
+ | Details: | ||
+ | ... | ||
+ | | ||
+ | $ | ||
+ | </ | ||
+ | |||
+ | ==== personal/ | ||
+ | You can create your own library of R modules which contains different | ||
+ | versions than provided through VALET, or modules not available via VALET. | ||
+ | |||
+ | R looks in an environment variable called ' | ||
+ | locations to search for modules. | ||
+ | in the list, this will allow your library to override any conflicts which | ||
+ | may be installed on the system. | ||
+ | modules into the first entry in this list by default. | ||
+ | |||
+ | === Simple example === | ||
+ | Once this is done, you can use the install using '' | ||
+ | is an example: | ||
+ | |||
+ | < | ||
+ | $ vpkg_require r r-cran | ||
+ | Adding package `r/3.1.1` to your environment | ||
+ | Adding package `r-cran/ | ||
+ | $ mkdir -p $WORKDIR/ | ||
+ | $ echo $R_LIBS | ||
+ | / | ||
+ | $ R_LIBS=" | ||
+ | $ R -q --no-save | ||
+ | > .libPaths() | ||
+ | [1] "/ | ||
+ | [2] "/ | ||
+ | [3] "/ | ||
+ | > chooseCRANmirror(all) | ||
+ | CRAN mirror | ||
+ | |||
+ | 1: 0-Cloud | ||
+ | 3: Argentina (Mendoza) | ||
+ | 5: Australia (Melbourne) | ||
+ | 7: Belgium | ||
+ | 9: Brazil (PR) 10: Brazil (RJ) | ||
+ | 11: Brazil (SP 1) 12: Brazil (SP 2) | ||
+ | 13: Canada (BC) 14: Canada (NS) | ||
+ | 15: Canada (ON) 16: Canada (QC 1) | ||
+ | 17: Canada (QC 2) 18: Chile | ||
+ | 19: China (Beijing 1) 20: China (Beijing 2) | ||
+ | 21: China (Hefei) | ||
+ | 23: Colombia (Bogota) | ||
+ | 25: Czech Republic | ||
+ | 27: Ecuador | ||
+ | 29: France (Lyon 1) 30: France (Lyon 2) | ||
+ | 31: France (Montpellier) | ||
+ | 33: France (Paris 2) 34: France (Strasbourg) | ||
+ | 35: Germany (Berlin) | ||
+ | 37: Germany (Goettingen) | ||
+ | 39: Hungary | ||
+ | 41: India 42: Indonesia (Jakarta) | ||
+ | 43: Indonesia (Jember) | ||
+ | 45: Ireland | ||
+ | 47: Italy (Padua) | ||
+ | 49: Japan (Hyogo) | ||
+ | 51: Japan (Tsukuba) | ||
+ | 53: Korea (Seoul 2) 54: Lebanon | ||
+ | 55: Mexico (Mexico City) 56: Mexico (Texcoco) | ||
+ | 57: Netherlands (Amsterdam) | ||
+ | 59: New Zealand | ||
+ | 61: Philippines | ||
+ | 63: Portugal | ||
+ | 65: Singapore | ||
+ | 67: South Africa (Cape Town) 68: South Africa (Johannesburg) | ||
+ | 69: Spain (A Coru? | ||
+ | 71: Sweden | ||
+ | 73: Taiwan (Chungli) | ||
+ | 75: Taiwan (Taipei) | ||
+ | 77: Turkey | ||
+ | 79: UK (Cambridge) | ||
+ | 81: UK (London) | ||
+ | 83: USA (CA 1) 84: USA (CA 2) | ||
+ | 85: USA (IA) 86: USA (IN) | ||
+ | 87: USA (KS) 88: USA (MD) | ||
+ | 89: USA (MI) 90: USA (MO) | ||
+ | 91: USA (OH) 92: USA (OR) | ||
+ | 93: USA (PA 1) 94: USA (PA 2) | ||
+ | 95: USA (TN) 96: USA (TX 1) | ||
+ | 97: USA (WA 1) 98: USA (WA 2) | ||
+ | 99: Venezuela | ||
+ | |||
+ | |||
+ | Selection: 88 | ||
+ | > install.packages(" | ||
+ | Installing package into '/ | ||
+ | (as ' | ||
+ | trying URL ' | ||
+ | Content type ' | ||
+ | opened URL | ||
+ | ================================================== | ||
+ | downloaded 23 Kb | ||
+ | |||
+ | * installing *source* package ' | ||
+ | ** package ' | ||
+ | ** libs | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gcc -std=gnu99 -I/ | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gfortran | ||
+ | gcc -std=gnu99 -shared -L/ | ||
+ | installing to / | ||
+ | ** R | ||
+ | ** inst | ||
+ | ** byte-compile and prepare package for lazy loading | ||
+ | ** help | ||
+ | *** installing help indices | ||
+ | ** building package indices | ||
+ | ** testing if installed package can be loaded | ||
+ | * DONE (KernSmooth) | ||
+ | |||
+ | The downloaded source packages are in | ||
+ | '/ | ||
+ | > library(KernSmooth) | ||
+ | KernSmooth 2.23 loaded | ||
+ | Copyright M. P. Wand 1997-2009 | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | Notice that the output of '' | ||
+ | directory first? | ||
+ | |||
+ | |||
+ | === Using IT's udbuild environment === | ||
+ | IT developed a formalization for installing modules called [[/ | ||
+ | which can simplify the installation of modules. | ||
+ | script which can be used to install a personal R library. | ||
+ | |||
+ | <file sh udbuild-testing-cuda> | ||
+ | #!/bin/bash -l | ||
+ | |||
+ | PKGNAME=testing | ||
+ | VERSION=default | ||
+ | |||
+ | UDBUILD_HOME=$WORKDIR/ | ||
+ | PKG_LIST=' | ||
+ | WideLM rpud permGPU magma gputools cudaBayesregData cudaBayesreg | ||
+ | CARramps | ||
+ | ' | ||
+ | |||
+ | vpkg_devrequire udbuild r/3.1.1 r-cran/ | ||
+ | init_udbuildenv r-addon cuda/6.5 | ||
+ | |||
+ | #Sometimes R doesn' | ||
+ | CPATH=$CUDA_PREFIX/ | ||
+ | LIBRARY_PATH=$CUDA_PREFIX/ | ||
+ | |||
+ | # | ||
+ | CRAN_MIRROR=' | ||
+ | |||
+ | quote() { printf '" | ||
+ | |||
+ | R -q --no-save <<EOT | ||
+ | .libPaths() | ||
+ | options(repos=structure(c(CRAN=" | ||
+ | for ( pkg in c( `quote $PKG_LIST` ) ) { | ||
+ | print(pkg) | ||
+ | install.packages(pkg, | ||
+ | } | ||
+ | |||
+ | warnings() | ||
+ | EOT | ||
+ | </ | ||
+ | |||
+ | This script will attempt to build the cuda capable R modules using the | ||
+ | cuda 6.5 version into '' | ||
+ | |||
+ | ====== R script in batch ====== | ||
+ | |||
+ | ==== matmul.R script ==== | ||
+ | |||
+ | Consider the simple R script file to multiply a small 3x3 matrix | ||
+ | |||
+ | <file R matmul.R> | ||
+ | # Calculate and print small matrix AA' | ||
+ | a <- matrix(1: | ||
+ | a%*%t(a) | ||
+ | </ | ||
+ | |||
+ | Let's test this R script using '' | ||
+ | |||
+ | <code bash> | ||
+ | workgroup -g it_css | ||
+ | qlogin | ||
+ | vpkg_require r/3 | ||
+ | Rscript matmul.R | ||
+ | </ | ||
+ | |||
+ | The output to the screen: | ||
+ | |||
+ | < | ||
+ | [,1] [,2] [,3] | ||
+ | [1,] 166 188 210 | ||
+ | [2,] 188 214 240 | ||
+ | [3,] 210 240 270 | ||
+ | </ | ||
+ | |||
+ | To return to the head node, type | ||
+ | <code bash> | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | ==== matmul.qs file ==== | ||
+ | |||
+ | To run a R script in batch instead of on the command line has nearly the same steps. | ||
+ | Consider the queue submission script file: | ||
+ | |||
+ | <file bash matmul.qs> | ||
+ | #$ -N matmultiply | ||
+ | |||
+ | # Add vpkg_require commands after this line: | ||
+ | vpkg_require r/3 | ||
+ | |||
+ | # Syntax: Rscript [options] filename.R [arguments] | ||
+ | Rscript matmul.R | ||
+ | </ | ||
+ | |||
+ | Now to run the R script simply submit the job from the head node with the | ||
+ | '' | ||
+ | |||
+ | < | ||
+ | qsub matmul.qs | ||
+ | </ | ||
+ | |||
+ | You should see a notification that your job was submitted. | ||
+ | |||
+ | <code bash> | ||
+ | Your job 2283886 (" | ||
+ | </ | ||
+ | |||
+ | After the code completes the output of the script will appear in the file | ||
+ | '' | ||
+ | |||
+ | < | ||
+ | more matmultiply.o2283886 | ||
+ | </ | ||
+ | |||
+ | to display the contents of the output file on the screen. | ||
+ | |||
+ | < | ||
+ | Adding dependency `x11/ | ||
+ | Adding package `r/3.0.2` to your environment | ||
+ | [,1] [,2] [,3] | ||
+ | [1,] 166 188 210 | ||
+ | [2,] 188 214 240 | ||
+ | [3,] 210 240 270 | ||
+ | </ | ||
+ | |||
+ | ====== Using R script in batch array job ====== | ||
+ | ===== sweep.R file ===== | ||
+ | |||
+ | Consider the simple script to print a fraction from the argument list | ||
+ | |||
+ | <file R sweep.R> | ||
+ | args <- commandArgs(trailingOnly = TRUE) | ||
+ | # print fraction from argument list | ||
+ | as.numeric(args[1])/ | ||
+ | </ | ||
+ | |||
+ | This is a R script with can be run from the command line on a compute node the commands | ||
+ | |||
+ | <code bash> | ||
+ | qlogin | ||
+ | vpkg_require r/3 | ||
+ | Rscript sweep.R 5 200 | ||
+ | </ | ||
+ | |||
+ | The output to the screen: | ||
+ | < | ||
+ | [1] 0.025 | ||
+ | </ | ||
+ | |||
+ | ===== sweep.qs file ===== | ||
+ | |||
+ | Consider the queue script file | ||
+ | |||
+ | <file bash sweep.qs> | ||
+ | #$ -N sweep | ||
+ | #$ -t 1-200 | ||
+ | ## | ||
+ | ## Parameter sweep array job to run the sweep.R | ||
+ | ## lambda = 0,1,2. ... 199 | ||
+ | ## | ||
+ | |||
+ | # Add vpkg_require commands after this line: | ||
+ | vpkg_require r/3 | ||
+ | |||
+ | date " | ||
+ | echo "Host $HOSTNAME" | ||
+ | |||
+ | let lambda=" | ||
+ | let taskCount=200 | ||
+ | |||
+ | # Syntax: Rscript [options] filename.R [arguments] | ||
+ | Rscript --vanilla sweep.R $lambda $taskCount | ||
+ | |||
+ | date " | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | There will be 200 array jobs all running the same script with different parameters (arguments). | ||
+ | is used to prevent the multiple jobs from using the same disk space. | ||
+ | |||
+ | To run this in batch you must submit the job from the head node with the | ||
+ | '' | ||
+ | |||
+ | < | ||
+ | qsub sweep.qs | ||
+ | </ | ||
+ | |||
+ | After the code completes the output of the script will appear in the files | ||
+ | '' | ||
+ | |||
+ | < | ||
+ | Adding dependency `x11/ | ||
+ | Adding package `r/3.0.2` to your environment | ||
+ | [1] 0.025 | ||
+ | </ | ||
+ | <note tip> | ||
+ | You will want to do more than just print out one fraction in your script. | ||
+ | a one dimensional parameter sweep, to construct unique input and output file names for each task, | ||
+ | or as a seed for the R Random Number Generator (RNG).</ | ||
+ | |||
+ | ==== Writing files from an array job ==== | ||
+ | |||
+ | You are running many jobs in the same directory. | ||
+ | separate files with "dot taskid" | ||
+ | |||
+ | <note important> | ||
+ | You need to make sure no two of your jobs will write to the same file. Look at your R script to see if you | ||
+ | are writing files. | ||
+ | If you are using these R functions, then use a unique file name constructed from the task id. | ||
+ | </ | ||
+ | |||
+ | ==== vanilla option ==== | ||
+ | |||
+ | The command-line option '' | ||
+ | be reading or writing to the same files. | ||
+ | in the init-file '' | ||
+ | them in your environ file '' | ||