software:r:farber

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:r:farber [2018-04-26 13:23] – [personal/program specific R libraries and extensions] sraskarsoftware:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita
Line 1: Line 1:
 +===== R on Farber =====
 +
 +==== Learning R ====
 +
 +== SWIRL ===
 +In addition to other resources, SWIRL is installed on the Farber cluster and is available as an interactive learning guide
 +inside R:
 +
 +<code>
 +$ vpkg_require r/3 r-cran
 +$ R -q --no-save
 +> library(swirl)
 +> swirl()
 +</code>
 +
 +
 +
 +==== R libraries and extensions ====
 +
 +=== Installed library bundles ===
 +The cluster also has the majority of [[http://cran.us.r-project.org/|CRAN]]
 +and [[http://www.bioconductor.org/|Bioconductor]] R libraries already
 +insalled.  These are installed as point-in-time snapshots of their
 +respective catalogs.  These libraries are broken down into different valet
 +packages based on dependencies.  The current bundles are below.  Together
 +these bundles provide access to over 6,600 R modules, pre-compiled and ready
 +for use.
 +
 +^r-cran         |All CRAN modules in CRAN which compile and install cleanly without any additional dependencies.  N.B. all below library packs require this CRAN modle as a base.|
 +^r-bioconductor |The full suite of[[http://www.bioconductor.org/|Bioconductor]] modules. |
 +^r-fftw         |CRAN modules which need FFTW |
 +^r-gsl          |CRAN modules which need GSL(GNU Scientific Library), GLPK(GNU Linear Programming Kit), or MPFR(GNU MPFR Library)  |
 +^r-gdal         |CRAN modules which need GDAL(Geospatial Data Abstraction Library) and GEOS(Geometry Engine, Open Source)  |
 +^r-jags         |CRAN modules which need JAGS(Just Another Gibbs Sampler) and the r-gsl library mentioned above. |
 +^r-mpi          |CRAN modules which need the OpenMPI libraries for parallel computing.  |
 +^r-netcdf       |CRAN modules which need NetCDF, HDF4, HDF5, and UDUNITS libraries. |
 +^r-all          |In addition to loading all the previously mentioned bundles, and CRAN module with multiple dependencies from the above list is also included. |
 +^r-cuda         |Not currently available. |
 +
 +=== Searching for modules ===
 +The HPC team provides a tool, which is loaded along with R, to help cluster
 +users locate modules in these various bundles,
 +it is called ''r-search'' and take arguments which will be
 +interpreted as extended regular expressions to the UNIX 
 +[[http://linux.die.net/man/1/egrep|egrep]] command (with case-insensitivity
 +enabled by default).
 +
 +If, for example, you are looking packages to help with a copula regression,
 +you may search for them as such:
 +
 +<code>
 +$ r-search copula
 +R          Location   ValetPackage              Library
 +-------    --------   ---------------           -----------
 +R/3.1.1    add-ons    r-cran/20140905         : acopula
 +R/3.1.1    add-ons    r-cran/20140905         : pencopula
 +R/3.1.1    add-ons    r-gsl/20140912          : CopulaRegression
 +R/3.1.1    add-ons    r-gsl/20140912          : VineCopula
 +R/3.1.1    add-ons    r-gsl/20140912          : copula
 +R/3.1.1    add-ons    r-gsl/20140912          : copulaedas
 +R/3.1.1    add-ons    r-gsl/20140912          : nacopula
 +$
 +</code>
 +
 +Not, it is clear that two bundles for version 3.1.1 of R contain modules
 +which may be of help.  If you require the "CopulaRegression" module, you
 +may use valet to load it into your environment via the "r-gsl/20140912"
 +bundle.
 +
 +=== Loading library bundles for use ===
 +<code>
 +$ vpkg_require r-gsl/20140912
 +Adding dependency `r-cran/20140905` to your environment
 +Adding dependency `gsl/1.16` to your environment
 +Adding dependency `glpk/4.55` to your environment
 +Adding dependency `mpfr/3.1.2` to your environment
 +Adding package `r-gsl/20140912` to your environment
 +$
 +</code>
 +
 +Now using the library in R can be done as normal.
 +
 +<code>
 +$ R --no-save -q
 +> library(CopulaRegression)
 +Loading required package: MASS
 +Loading required package: VineCopula
 +>
 +</code>
 +
 +=== Learning about modules ===
 +IT provides a small script called ''r-info'' which will display the internal
 +documentation of R modules.  This is helpful to get basic information on
 +a module to decide if it requires more research.  To use this tool, the library
 +must be installed, and the module bundle must be loaded with ''vpkg_require''.
 +For example:
 +
 +<code>
 +$ vpkg_require r/3.1.1 r-cran/3.1.1
 +$ r-info car
 +car-package                package:car                 R Documentation
 +
 +Companion to Applied Regression
 +
 +Description:
 +
 +     This package accompanies Fox, J. and Weisberg, S., _An R Companion
 +     to Applied Regression_, Second Edition, Sage, 2011.
 +
 +Details:
 +...
 +     Maintainer: John Fox <jfox@mcmaster.ca>
 +$
 +</code>
 +
 +==== personal/program specific R libraries and extensions ====
 +You can create your own library of R modules which contains different
 +versions than provided through VALET, or modules not available via VALET.
 +
 +R looks in an environment variable called 'R_LIBS' to obtain a list of
 +locations to search for modules.  You should ensure your entry is first
 +in the list, this will allow your library to override any conflicts which
 +may be installed on the system.  This is also important, because R installs
 +modules into the first entry in this list by default.
 +
 +=== Simple example ===
 +Once this is done, you can use the install using ''install.packages'' Here
 +is an example:
 +
 +<code>
 +$ vpkg_require r r-cran
 +Adding package `r/3.1.1` to your environment
 +Adding package `r-cran/20140905` to your environment
 +$ mkdir -p $WORKDIR/sw/r/add-ons/r3.1.1/testing/default
 +$ echo $R_LIBS
 +/opt/shared/r/add-ons/r3.1.1/cran/20140905
 +$ R_LIBS="$WORKDIR/sw/r/add-ons/r3.1.1/testing/default:$R_LIBS"
 +$ R -q --no-save
 +> .libPaths()
 +[1] "/home/work/it_nss/sw/r/add-ons/r3.1.1/testing/default"
 +[2] "/home/software/r/add-ons/r3.1.1/cran/20140905"
 +[3] "/home/software/r/3.1.1/lib64/R/library"
 +> chooseCRANmirror(all)
 +CRAN mirror
 +
 +  1: 0-Cloud                        2: Argentina (La Plata)
 +  3: Argentina (Mendoza)            4: Australia (Canberra)
 +  5: Australia (Melbourne)          6: Austria
 +  7: Belgium                        8: Brazil (BA)
 +  9: Brazil (PR)                   10: Brazil (RJ)
 + 11: Brazil (SP 1)                 12: Brazil (SP 2)
 + 13: Canada (BC)                   14: Canada (NS)
 + 15: Canada (ON)                   16: Canada (QC 1)
 + 17: Canada (QC 2)                 18: Chile
 + 19: China (Beijing 1)             20: China (Beijing 2)
 + 21: China (Hefei)                 22: China (Xiamen)
 + 23: Colombia (Bogota)             24: Colombia (Cali)
 + 25: Czech Republic                26: Denmark
 + 27: Ecuador                       28: Estonia
 + 29: France (Lyon 1)               30: France (Lyon 2)
 + 31: France (Montpellier)          32: France (Paris 1)
 + 33: France (Paris 2)              34: France (Strasbourg)
 + 35: Germany (Berlin)              36: Germany (Bonn)
 + 37: Germany (Goettingen)          38: Greece
 + 39: Hungary                       40: Iceland
 + 41: India                         42: Indonesia (Jakarta)
 + 43: Indonesia (Jember)            44: Iran
 + 45: Ireland                       46: Italy (Milano)
 + 47: Italy (Padua)                 48: Italy (Palermo)
 + 49: Japan (Hyogo)                 50: Japan (Tokyo)
 + 51: Japan (Tsukuba)               52: Korea (Seoul 1)
 + 53: Korea (Seoul 2)               54: Lebanon
 + 55: Mexico (Mexico City)          56: Mexico (Texcoco)
 + 57: Netherlands (Amsterdam)       58: Netherlands (Utrecht)
 + 59: New Zealand                   60: Norway
 + 61: Philippines                   62: Poland
 + 63: Portugal                      64: Russia
 + 65: Singapore                     66: Slovakia
 + 67: South Africa (Cape Town)      68: South Africa (Johannesburg)
 + 69: Spain (A Coru?a)              70: Spain (Madrid)
 + 71: Sweden                        72: Switzerland
 + 73: Taiwan (Chungli)              74: Taiwan (Taichung)
 + 75: Taiwan (Taipei)               76: Thailand
 + 77: Turkey                        78: UK (Bristol)
 + 79: UK (Cambridge)                80: UK (London)
 + 81: UK (London)                   82: UK (St Andrews)
 + 83: USA (CA 1)                    84: USA (CA 2)
 + 85: USA (IA)                      86: USA (IN)
 + 87: USA (KS)                      88: USA (MD)
 + 89: USA (MI)                      90: USA (MO)
 + 91: USA (OH)                      92: USA (OR)
 + 93: USA (PA 1)                    94: USA (PA 2)
 + 95: USA (TN)                      96: USA (TX 1)
 + 97: USA (WA 1)                    98: USA (WA 2)
 + 99: Venezuela                    100: Vietnam
 +
 +
 +Selection: 88
 +> install.packages("KernSmooth", dependencies=TRUE)
 +Installing package into '/home/work/it_nss/sw/r/add-ons/r3.1.1/testing/default'
 +(as 'lib' is unspecified)
 +trying URL 'http://watson.nci.nih.gov/cran_mirror/src/contrib/KernSmooth_2.23-13.tar.gz'
 +Content type 'application/octet-stream' length 24471 bytes (23 Kb)
 +opened URL
 +==================================================
 +downloaded 23 Kb
 +
 +* installing *source* package 'KernSmooth' ...
 +** package 'KernSmooth' successfully unpacked and MD5 sums checked
 +** libs
 +gfortran   -fpic  -g -O2  -c blkest.f -o blkest.o
 +gfortran   -fpic  -g -O2  -c cp.f -o cp.o
 +gfortran   -fpic  -g -O2  -c dgedi.f -o dgedi.o
 +gfortran   -fpic  -g -O2  -c dgefa.f -o dgefa.o
 +gfortran   -fpic  -g -O2  -c dgesl.f -o dgesl.o
 +gcc -std=gnu99 -I/opt/shared/r/3.1.1/lib64/R/include -DNDEBUG  -I/usr/local/include    -fpic  -g -O2  -c init.c -o init.o
 +gfortran   -fpic  -g -O2  -c linbin.f -o linbin.o
 +gfortran   -fpic  -g -O2  -c linbin2D.f -o linbin2D.o
 +gfortran   -fpic  -g -O2  -c locpoly.f -o locpoly.o
 +gfortran   -fpic  -g -O2  -c rlbin.f -o rlbin.o
 +gfortran   -fpic  -g -O2  -c sdiag.f -o sdiag.o
 +gfortran   -fpic  -g -O2  -c sstdiag.f -o sstdiag.o
 +gcc -std=gnu99 -shared -L/usr/local/lib64 -o KernSmooth.so blkest.o cp.o dgedi.o dgefa.o dgesl.o init.o linbin.o linbin2D.o locpoly.o rlbin.o sdiag.o sstdiag.o -L/opt/shared/r/3.1.1/lib64/R/lib -lRblas -lgfortran -lm -lgfortran -lm -L/opt/shared/r/3.1.1/lib64/R/lib -lR
 +installing to /home/work/it_nss/sw/r/add-ons/r3.1.1/testing/default/KernSmooth/libs
 +** R
 +** inst
 +** byte-compile and prepare package for lazy loading
 +** help
 +*** installing help indices
 +** building package indices
 +** testing if installed package can be loaded
 +* DONE (KernSmooth)
 +
 +The downloaded source packages are in
 +        '/tmp/RtmpylqWXj/downloaded_packages'
 +> library(KernSmooth)
 +KernSmooth 2.23 loaded
 +Copyright M. P. Wand 1997-2009
 +>
 +</code>
 +
 +Notice that the output of ''.libPaths()'' specifies my personal library
 +directory first?  
 +
 +
 +=== Using IT's udbuild environment ===
 +IT developed a formalization for installing modules called [[/abstract/farber/install_software|udbuild]]
 +which can simplify the installation of modules.  Here is an example ''udbuild''
 +script which can be used to install a personal R library.
 +
 +<file sh udbuild-testing-cuda>
 +#!/bin/bash -l
 +
 +PKGNAME=testing
 +VERSION=default
 +
 +UDBUILD_HOME=$WORKDIR/sw
 +PKG_LIST='
 + WideLM rpud permGPU magma gputools cudaBayesregData cudaBayesreg
 + CARramps
 +'
 +
 +vpkg_devrequire udbuild r/3.1.1 r-cran/20140905
 +init_udbuildenv r-addon cuda/6.5
 +
 +#Sometimes R doesn't properly use CPPFLAGS which is set by VALET, fix that here:
 +CPATH=$CUDA_PREFIX/include:$CPATH
 +LIBRARY_PATH=$CUDA_PREFIX/lib64:$CUDA_PREFIX/lib64/stubs:$LIBRARY_PATH
 +
 +#CRAN_MIRROR='http://cran.cs.wwu.edu/'
 +CRAN_MIRROR='http://lib.stat.cmu.edu/R/CRAN/'
 +
 +quote() { printf '"%s", ' "$@" | sed 's/, $/\n/'; }
 +
 +R -q --no-save <<EOT
 + .libPaths()
 + options(repos=structure(c(CRAN="$CRAN_MIRROR")))
 + for ( pkg in c( `quote $PKG_LIST` ) ) {
 + print(pkg)
 + install.packages(pkg, dependencies=TRUE)
 + }
 +
 + warnings()
 +EOT
 +</file>
 +
 +This script will attempt to build the cuda capable R modules using the
 +cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''.
 +
 +====== R script in batch ======
 +
 +==== matmul.R script ====
 +
 +Consider the simple R script file to multiply a small 3x3 matrix
 +
 +<file R matmul.R>
 +# Calculate and print small matrix AA'
 +a <- matrix(1:12,3,4);
 +a%*%t(a)
 +</file>
 +
 +Let's test this R script using ''Rscript'' from the command line on a compute node.  Don't forget to set your [[general/userguide/04_compute_environ?&#using-workgroup-and-directories|workgroup]] to define your cluster group or //investing-entity// compute nodes before you use ''qlogin'' to get on a compute node. For example,
 +
 +<code bash>
 +workgroup -g it_css
 +qlogin
 +vpkg_require r/3
 +Rscript matmul.R
 +</code>
 +
 +The output to the screen:
 +
 +<code>
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +To return to the head node, type
 +<code bash>
 +exit
 +</code>
 +
 +==== matmul.qs file ====
 +
 +To run a R script in batch instead of on the command line has nearly the same steps.
 +Consider the queue submission script file:
 +
 +<file bash matmul.qs>
 +#$ -N matmultiply
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript matmul.R 
 +</file>
 +
 +Now to run the R script simply submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub matmul.qs
 +</code>
 +
 +You should see a notification that your job was submitted.  Something like this
 +
 +<code bash>
 +Your job 2283886 ("matmultiply") has been submitted
 +</code>
 +
 +After the code completes the output of the script will appear in the file
 +''matmultiply.o2283886'' because ''-N matmultiply'' defines the name of the job in ''matmul.qs'' and appears in the notification above as ''("matmultiply")'' with ''2283886'' assigned as the job ID. Type 
 +
 +<code>
 +more matmultiply.o2283886
 +</code>
 +
 +to display the contents of the output file on the screen.  For example,
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +====== Using R script in batch array job ======
 +===== sweep.R file =====
 +
 +Consider the simple script to print a fraction from the argument list
 +
 +<file R sweep.R>
 +args <- commandArgs(trailingOnly = TRUE)
 +# print fraction from argument list 
 +as.numeric(args[1])/as.numeric(args[2])
 +</file>
 +
 +This is a R script with can be run from the command line on a compute node the commands
 +
 +<code bash>
 +qlogin
 +vpkg_require r/3
 +Rscript sweep.R 5 200
 +</code>
 +
 +The output to the screen:
 +<code>
 +[1] 0.025
 +</code>
 +
 +===== sweep.qs file =====
 +
 +Consider the queue script file
 +
 +<file bash sweep.qs>
 +#$ -N sweep
 +#$ -t 1-200
 +## 
 +## Parameter sweep array job to run the sweep.R  with
 +##    lambda = 0,1,2. ... 199
 +##
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +date "+Start %s"
 +echo "Host $HOSTNAME"
 +
 +let lambda="$SGE_TASK_ID-1"
 +let taskCount=200
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript --vanilla sweep.R $lambda $taskCount
 +
 +date "+Finish %s"
 +</file>
 +
 +The ''date'' and ''echo Host'' lines are just a way of keeping track of when and where the jobs are run.
 +There will be 200 array jobs all running the same script with different parameters (arguments).  The ''--vanilla'' option
 +is used to prevent the multiple jobs from using the same disk space.
 +
 +To run this in batch you must submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub sweep.qs
 +</code>
 +
 +After the code completes the output of the script will appear in the files
 +''sweep.o535064.1'' to ''sweep.o535064.200''. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the ''-t 1-200'')
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +[1] 0.025
 +</code>
 +<note tip>
 +You will want to do more than just print out one fraction in your script.  The integer parameter can be used for
 +a one dimensional parameter sweep, to construct unique input and output file names for each task, 
 +or as a seed for the R Random Number Generator (RNG).</note>
 +
 +==== Writing files from an array job ====
 +
 +You are running many jobs in the same directory.  Grid engine handles the standard output by writing to
 +separate files with "dot taskid" appended to the jobid.  You need to take care of other file output in your R script.
 +
 +<note important>
 +You need to make sure no two of your jobs will write to the same file.  Look at your R script to see if you
 +are writing files.  Look for the ''**sink**'' command or any graphics writing commands such as ''**pdf**'' or ''**png**''.
 +If you are using these R functions, then use a unique file name constructed from the task id.
 +</note>
 +
 +==== vanilla option ====
 +
 +The command-line option ''--vanilla'' implies --no-site-file, --no-init-file and --no-environ.  This way you will not
 +be reading or writing to the same files.  If you need initialization command, put them in your R script instead of in
 +in the init-file ''.Rprofile'' If you need some environment variables, export them in your bash script instead of assigning
 +them in your environ file ''.Renviron''.