Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| software:r:farber [2019-08-30 15:11] – anita | software:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ===== R on Farber ===== | ||
| + | |||
| + | ==== Learning R ==== | ||
| + | |||
| + | == SWIRL === | ||
| + | In addition to other resources, SWIRL is installed on the Farber cluster and is available as an interactive learning guide | ||
| + | inside R: | ||
| + | |||
| + | < | ||
| + | $ vpkg_require r/3 r-cran | ||
| + | $ R -q --no-save | ||
| + | > library(swirl) | ||
| + | > swirl() | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | ==== R libraries and extensions ==== | ||
| + | |||
| + | === Installed library bundles === | ||
| + | The cluster also has the majority of [[http:// | ||
| + | and [[http:// | ||
| + | insalled. | ||
| + | respective catalogs. | ||
| + | packages based on dependencies. | ||
| + | these bundles provide access to over 6,600 R modules, pre-compiled and ready | ||
| + | for use. | ||
| + | |||
| + | ^r-cran | ||
| + | ^r-bioconductor |The full suite of[[http:// | ||
| + | ^r-fftw | ||
| + | ^r-gsl | ||
| + | ^r-gdal | ||
| + | ^r-jags | ||
| + | ^r-mpi | ||
| + | ^r-netcdf | ||
| + | ^r-all | ||
| + | ^r-cuda | ||
| + | |||
| + | === Searching for modules === | ||
| + | The HPC team provides a tool, which is loaded along with R, to help cluster | ||
| + | users locate modules in these various bundles, | ||
| + | it is called '' | ||
| + | interpreted as extended regular expressions to the UNIX | ||
| + | [[http:// | ||
| + | enabled by default). | ||
| + | |||
| + | If, for example, you are looking packages to help with a copula regression, | ||
| + | you may search for them as such: | ||
| + | |||
| + | < | ||
| + | $ r-search copula | ||
| + | R Location | ||
| + | ------- | ||
| + | R/ | ||
| + | R/ | ||
| + | R/ | ||
| + | R/ | ||
| + | R/ | ||
| + | R/ | ||
| + | R/ | ||
| + | $ | ||
| + | </ | ||
| + | |||
| + | Not, it is clear that two bundles for version 3.1.1 of R contain modules | ||
| + | which may be of help. If you require the " | ||
| + | may use valet to load it into your environment via the " | ||
| + | bundle. | ||
| + | |||
| + | === Loading library bundles for use === | ||
| + | < | ||
| + | $ vpkg_require r-gsl/ | ||
| + | Adding dependency `r-cran/ | ||
| + | Adding dependency `gsl/1.16` to your environment | ||
| + | Adding dependency `glpk/4.55` to your environment | ||
| + | Adding dependency `mpfr/ | ||
| + | Adding package `r-gsl/ | ||
| + | $ | ||
| + | </ | ||
| + | |||
| + | Now using the library in R can be done as normal. | ||
| + | |||
| + | < | ||
| + | $ R --no-save -q | ||
| + | > library(CopulaRegression) | ||
| + | Loading required package: MASS | ||
| + | Loading required package: VineCopula | ||
| + | > | ||
| + | </ | ||
| + | |||
| + | === Learning about modules === | ||
| + | IT provides a small script called '' | ||
| + | documentation of R modules. | ||
| + | a module to decide if it requires more research. | ||
| + | must be installed, and the module bundle must be loaded with '' | ||
| + | For example: | ||
| + | |||
| + | < | ||
| + | $ vpkg_require r/3.1.1 r-cran/ | ||
| + | $ r-info car | ||
| + | car-package | ||
| + | |||
| + | Companion to Applied Regression | ||
| + | |||
| + | Description: | ||
| + | |||
| + | This package accompanies Fox, J. and Weisberg, S., _An R Companion | ||
| + | to Applied Regression_, | ||
| + | |||
| + | Details: | ||
| + | ... | ||
| + | | ||
| + | $ | ||
| + | </ | ||
| + | |||
| + | ==== personal/ | ||
| + | You can create your own library of R modules which contains different | ||
| + | versions than provided through VALET, or modules not available via VALET. | ||
| + | |||
| + | R looks in an environment variable called ' | ||
| + | locations to search for modules. | ||
| + | in the list, this will allow your library to override any conflicts which | ||
| + | may be installed on the system. | ||
| + | modules into the first entry in this list by default. | ||
| + | |||
| + | === Simple example === | ||
| + | Once this is done, you can use the install using '' | ||
| + | is an example: | ||
| + | |||
| + | < | ||
| + | $ vpkg_require r r-cran | ||
| + | Adding package `r/3.1.1` to your environment | ||
| + | Adding package `r-cran/ | ||
| + | $ mkdir -p $WORKDIR/ | ||
| + | $ echo $R_LIBS | ||
| + | / | ||
| + | $ R_LIBS=" | ||
| + | $ R -q --no-save | ||
| + | > .libPaths() | ||
| + | [1] "/ | ||
| + | [2] "/ | ||
| + | [3] "/ | ||
| + | > chooseCRANmirror(all) | ||
| + | CRAN mirror | ||
| + | |||
| + | 1: 0-Cloud | ||
| + | 3: Argentina (Mendoza) | ||
| + | 5: Australia (Melbourne) | ||
| + | 7: Belgium | ||
| + | 9: Brazil (PR) 10: Brazil (RJ) | ||
| + | 11: Brazil (SP 1) 12: Brazil (SP 2) | ||
| + | 13: Canada (BC) 14: Canada (NS) | ||
| + | 15: Canada (ON) 16: Canada (QC 1) | ||
| + | 17: Canada (QC 2) 18: Chile | ||
| + | 19: China (Beijing 1) 20: China (Beijing 2) | ||
| + | 21: China (Hefei) | ||
| + | 23: Colombia (Bogota) | ||
| + | 25: Czech Republic | ||
| + | 27: Ecuador | ||
| + | 29: France (Lyon 1) 30: France (Lyon 2) | ||
| + | 31: France (Montpellier) | ||
| + | 33: France (Paris 2) 34: France (Strasbourg) | ||
| + | 35: Germany (Berlin) | ||
| + | 37: Germany (Goettingen) | ||
| + | 39: Hungary | ||
| + | 41: India 42: Indonesia (Jakarta) | ||
| + | 43: Indonesia (Jember) | ||
| + | 45: Ireland | ||
| + | 47: Italy (Padua) | ||
| + | 49: Japan (Hyogo) | ||
| + | 51: Japan (Tsukuba) | ||
| + | 53: Korea (Seoul 2) 54: Lebanon | ||
| + | 55: Mexico (Mexico City) 56: Mexico (Texcoco) | ||
| + | 57: Netherlands (Amsterdam) | ||
| + | 59: New Zealand | ||
| + | 61: Philippines | ||
| + | 63: Portugal | ||
| + | 65: Singapore | ||
| + | 67: South Africa (Cape Town) 68: South Africa (Johannesburg) | ||
| + | 69: Spain (A Coru? | ||
| + | 71: Sweden | ||
| + | 73: Taiwan (Chungli) | ||
| + | 75: Taiwan (Taipei) | ||
| + | 77: Turkey | ||
| + | 79: UK (Cambridge) | ||
| + | 81: UK (London) | ||
| + | 83: USA (CA 1) 84: USA (CA 2) | ||
| + | 85: USA (IA) 86: USA (IN) | ||
| + | 87: USA (KS) 88: USA (MD) | ||
| + | 89: USA (MI) 90: USA (MO) | ||
| + | 91: USA (OH) 92: USA (OR) | ||
| + | 93: USA (PA 1) 94: USA (PA 2) | ||
| + | 95: USA (TN) 96: USA (TX 1) | ||
| + | 97: USA (WA 1) 98: USA (WA 2) | ||
| + | 99: Venezuela | ||
| + | |||
| + | |||
| + | Selection: 88 | ||
| + | > install.packages(" | ||
| + | Installing package into '/ | ||
| + | (as ' | ||
| + | trying URL ' | ||
| + | Content type ' | ||
| + | opened URL | ||
| + | ================================================== | ||
| + | downloaded 23 Kb | ||
| + | |||
| + | * installing *source* package ' | ||
| + | ** package ' | ||
| + | ** libs | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gcc -std=gnu99 -I/ | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gfortran | ||
| + | gcc -std=gnu99 -shared -L/ | ||
| + | installing to / | ||
| + | ** R | ||
| + | ** inst | ||
| + | ** byte-compile and prepare package for lazy loading | ||
| + | ** help | ||
| + | *** installing help indices | ||
| + | ** building package indices | ||
| + | ** testing if installed package can be loaded | ||
| + | * DONE (KernSmooth) | ||
| + | |||
| + | The downloaded source packages are in | ||
| + | '/ | ||
| + | > library(KernSmooth) | ||
| + | KernSmooth 2.23 loaded | ||
| + | Copyright M. P. Wand 1997-2009 | ||
| + | > | ||
| + | </ | ||
| + | |||
| + | Notice that the output of '' | ||
| + | directory first? | ||
| + | |||
| + | |||
| + | === Using IT's udbuild environment === | ||
| + | IT developed a formalization for installing modules called [[/ | ||
| + | which can simplify the installation of modules. | ||
| + | script which can be used to install a personal R library. | ||
| + | |||
| + | <file sh udbuild-testing-cuda> | ||
| + | #!/bin/bash -l | ||
| + | |||
| + | PKGNAME=testing | ||
| + | VERSION=default | ||
| + | |||
| + | UDBUILD_HOME=$WORKDIR/ | ||
| + | PKG_LIST=' | ||
| + | WideLM rpud permGPU magma gputools cudaBayesregData cudaBayesreg | ||
| + | CARramps | ||
| + | ' | ||
| + | |||
| + | vpkg_devrequire udbuild r/3.1.1 r-cran/ | ||
| + | init_udbuildenv r-addon cuda/6.5 | ||
| + | |||
| + | #Sometimes R doesn' | ||
| + | CPATH=$CUDA_PREFIX/ | ||
| + | LIBRARY_PATH=$CUDA_PREFIX/ | ||
| + | |||
| + | # | ||
| + | CRAN_MIRROR=' | ||
| + | |||
| + | quote() { printf '" | ||
| + | |||
| + | R -q --no-save <<EOT | ||
| + | .libPaths() | ||
| + | options(repos=structure(c(CRAN=" | ||
| + | for ( pkg in c( `quote $PKG_LIST` ) ) { | ||
| + | print(pkg) | ||
| + | install.packages(pkg, | ||
| + | } | ||
| + | |||
| + | warnings() | ||
| + | EOT | ||
| + | </ | ||
| + | |||
| + | This script will attempt to build the cuda capable R modules using the | ||
| + | cuda 6.5 version into '' | ||
| + | |||
| + | ====== R script in batch ====== | ||
| + | |||
| + | ==== matmul.R script ==== | ||
| + | |||
| + | Consider the simple R script file to multiply a small 3x3 matrix | ||
| + | |||
| + | <file R matmul.R> | ||
| + | # Calculate and print small matrix AA' | ||
| + | a <- matrix(1: | ||
| + | a%*%t(a) | ||
| + | </ | ||
| + | |||
| + | Let's test this R script using '' | ||
| + | |||
| + | <code bash> | ||
| + | workgroup -g it_css | ||
| + | qlogin | ||
| + | vpkg_require r/3 | ||
| + | Rscript matmul.R | ||
| + | </ | ||
| + | |||
| + | The output to the screen: | ||
| + | |||
| + | < | ||
| + | [,1] [,2] [,3] | ||
| + | [1,] 166 188 210 | ||
| + | [2,] 188 214 240 | ||
| + | [3,] 210 240 270 | ||
| + | </ | ||
| + | |||
| + | To return to the head node, type | ||
| + | <code bash> | ||
| + | exit | ||
| + | </ | ||
| + | |||
| + | ==== matmul.qs file ==== | ||
| + | |||
| + | To run a R script in batch instead of on the command line has nearly the same steps. | ||
| + | Consider the queue submission script file: | ||
| + | |||
| + | <file bash matmul.qs> | ||
| + | #$ -N matmultiply | ||
| + | |||
| + | # Add vpkg_require commands after this line: | ||
| + | vpkg_require r/3 | ||
| + | |||
| + | # Syntax: Rscript [options] filename.R [arguments] | ||
| + | Rscript matmul.R | ||
| + | </ | ||
| + | |||
| + | Now to run the R script simply submit the job from the head node with the | ||
| + | '' | ||
| + | |||
| + | < | ||
| + | qsub matmul.qs | ||
| + | </ | ||
| + | |||
| + | You should see a notification that your job was submitted. | ||
| + | |||
| + | <code bash> | ||
| + | Your job 2283886 (" | ||
| + | </ | ||
| + | |||
| + | After the code completes the output of the script will appear in the file | ||
| + | '' | ||
| + | |||
| + | < | ||
| + | more matmultiply.o2283886 | ||
| + | </ | ||
| + | |||
| + | to display the contents of the output file on the screen. | ||
| + | |||
| + | < | ||
| + | Adding dependency `x11/ | ||
| + | Adding package `r/3.0.2` to your environment | ||
| + | [,1] [,2] [,3] | ||
| + | [1,] 166 188 210 | ||
| + | [2,] 188 214 240 | ||
| + | [3,] 210 240 270 | ||
| + | </ | ||
| + | |||
| + | ====== Using R script in batch array job ====== | ||
| + | ===== sweep.R file ===== | ||
| + | |||
| + | Consider the simple script to print a fraction from the argument list | ||
| + | |||
| + | <file R sweep.R> | ||
| + | args <- commandArgs(trailingOnly = TRUE) | ||
| + | # print fraction from argument list | ||
| + | as.numeric(args[1])/ | ||
| + | </ | ||
| + | |||
| + | This is a R script with can be run from the command line on a compute node the commands | ||
| + | |||
| + | <code bash> | ||
| + | qlogin | ||
| + | vpkg_require r/3 | ||
| + | Rscript sweep.R 5 200 | ||
| + | </ | ||
| + | |||
| + | The output to the screen: | ||
| + | < | ||
| + | [1] 0.025 | ||
| + | </ | ||
| + | |||
| + | ===== sweep.qs file ===== | ||
| + | |||
| + | Consider the queue script file | ||
| + | |||
| + | <file bash sweep.qs> | ||
| + | #$ -N sweep | ||
| + | #$ -t 1-200 | ||
| + | ## | ||
| + | ## Parameter sweep array job to run the sweep.R | ||
| + | ## lambda = 0,1,2. ... 199 | ||
| + | ## | ||
| + | |||
| + | # Add vpkg_require commands after this line: | ||
| + | vpkg_require r/3 | ||
| + | |||
| + | date " | ||
| + | echo "Host $HOSTNAME" | ||
| + | |||
| + | let lambda=" | ||
| + | let taskCount=200 | ||
| + | |||
| + | # Syntax: Rscript [options] filename.R [arguments] | ||
| + | Rscript --vanilla sweep.R $lambda $taskCount | ||
| + | |||
| + | date " | ||
| + | </ | ||
| + | |||
| + | The '' | ||
| + | There will be 200 array jobs all running the same script with different parameters (arguments). | ||
| + | is used to prevent the multiple jobs from using the same disk space. | ||
| + | |||
| + | To run this in batch you must submit the job from the head node with the | ||
| + | '' | ||
| + | |||
| + | < | ||
| + | qsub sweep.qs | ||
| + | </ | ||
| + | |||
| + | After the code completes the output of the script will appear in the files | ||
| + | '' | ||
| + | |||
| + | < | ||
| + | Adding dependency `x11/ | ||
| + | Adding package `r/3.0.2` to your environment | ||
| + | [1] 0.025 | ||
| + | </ | ||
| + | <note tip> | ||
| + | You will want to do more than just print out one fraction in your script. | ||
| + | a one dimensional parameter sweep, to construct unique input and output file names for each task, | ||
| + | or as a seed for the R Random Number Generator (RNG).</ | ||
| + | |||
| + | ==== Writing files from an array job ==== | ||
| + | |||
| + | You are running many jobs in the same directory. | ||
| + | separate files with "dot taskid" | ||
| + | |||
| + | <note important> | ||
| + | You need to make sure no two of your jobs will write to the same file. Look at your R script to see if you | ||
| + | are writing files. | ||
| + | If you are using these R functions, then use a unique file name constructed from the task id. | ||
| + | </ | ||
| + | |||
| + | ==== vanilla option ==== | ||
| + | |||
| + | The command-line option '' | ||
| + | be reading or writing to the same files. | ||
| + | in the init-file '' | ||
| + | them in your environ file '' | ||