Differences
This shows you the differences between two versions of the page.
software:tensorflow:darwin [2024-05-20 11:14] – created frey | software:tensorflow:darwin [2024-05-20 11:51] (current) – frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Tensorflow on DARWIN ====== | ||
+ | |||
+ | TensorFlow is a combination of Python scripted software and compiled libraries and tools. | ||
+ | |||
+ | On DARWIN, only container images are provided to users. | ||
+ | |||
+ | ===== Container images ===== | ||
+ | |||
+ | IT RCI maintains TensorFlow Singularity containers for all users of DARWIN: | ||
+ | |||
+ | <code bash> | ||
+ | $ vpkg_versions tensorflow | ||
+ | |||
+ | Available versions in package (* = default version): | ||
+ | |||
+ | [/ | ||
+ | tensorflow | ||
+ | 2.3: | ||
+ | * 2.8: | ||
+ | 2.9: | ||
+ | 2.14.0 | ||
+ | 2.15:rocm TF 2.15 with ROCM 6.1 AMD GPU support | ||
+ | 2.16.1 | ||
+ | </ | ||
+ | |||
+ | You write your Python code either somewhere in your home directory ($HOME) or somewhere under your workgroup directory ($WORKDIR). | ||
+ | |||
+ | Assuming you will use your personal workgroup storage directory ('' | ||
+ | |||
+ | <code bash> | ||
+ | $ mkdir -p ${WORKDIR_USER}/ | ||
+ | $ cd ${WORKDIR_USER}/ | ||
+ | </ | ||
+ | |||
+ | For example, say your TensorFlow Python script is called '' | ||
+ | |||
+ | <code bash> | ||
+ | $ cp / | ||
+ | </ | ||
+ | |||
+ | The job script template has extensive documentation that should assist you in customizing it for the job. Last but not least, you need to specify the version of Tensorflow you want via VALET, and then the last line should be changed to match your Python script name and for this example, so for this example it would be '' | ||
+ | |||
+ | |||
+ | <code bash> | ||
+ | : | ||
+ | |||
+ | # | ||
+ | # Add a TensorFlow container to the environment: | ||
+ | # | ||
+ | vpkg_require tensorflow/ | ||
+ | |||
+ | # | ||
+ | # Execute our TensorFlow Python script: | ||
+ | # | ||
+ | python3 tf-script.py | ||
+ | </ | ||
+ | |||
+ | Finally, submit the job using the '' | ||
+ | |||
+ | <code bash> | ||
+ | $ sbatch tensorflow.qs | ||
+ | </ | ||
+ | |||
+ | ==== Coprocessor usage ==== | ||
+ | |||
+ | The DARWIN cluster includes nodes with NVIDIA (CUDA-based) GPGPUs and AMD (ROCM-based) GPUs. TensorFlow images with support for these coprocessors are available. | ||
+ | |||
+ | ===== Virtual environments ===== | ||
+ | |||
+ | As of 2024, Anaconda virtual environments are suggested for TensorFlow virtual environments. | ||
+ | |||
+ | Start by adding the Anaconda distribution base to the environment (here '' | ||
+ | |||
+ | <code bash> | ||
+ | [(my_workgroup: | ||
+ | Adding package `anaconda/ | ||
+ | [(my_workgroup: | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | |||
+ | <code bash> | ||
+ | [(my_workgroup: | ||
+ | Loading channels: done | ||
+ | # Name | ||
+ | tensorflow | ||
+ | tensorflow | ||
+ | : | ||
+ | tensorflow | ||
+ | : | ||
+ | tensorflow | ||
+ | : | ||
+ | tensorflow | ||
+ | </ | ||
+ | |||
+ | Note that the build tag provides the distinction between variants built on top of specific devices or libraries. | ||
+ | |||
+ | All versions of the TensorFlow virtualenv will be stored in the common base directory, '' | ||
+ | |||
+ | <code bash> | ||
+ | [(my_workgroup: | ||
+ | 2.12.0-mkl | ||
+ | </ | ||
+ | |||
+ | The virtualenv is created using the '' | ||
+ | |||
+ | <code bash> | ||
+ | [(my_workgroup: | ||
+ | : | ||
+ | Preparing transaction: | ||
+ | Verifying transaction: | ||
+ | Executing transaction: | ||
+ | # | ||
+ | # To activate this environment, | ||
+ | # | ||
+ | # $ conda activate / | ||
+ | # | ||
+ | # To deactivate an active environment, | ||
+ | # | ||
+ | # $ conda deactivate | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== VALET package definition ==== | ||
+ | |||
+ | Assuming the workgroup does //not// already have a TensorFlow VALET package definition, the following YAML config can be modified (e.g. alter the '' | ||
+ | |||
+ | <code yaml> | ||
+ | tensorflow: | ||
+ | prefix: / | ||
+ | description: | ||
+ | url: " | ||
+ | | ||
+ | flags: | ||
+ | - no-standard-paths | ||
+ | |||
+ | versions: | ||
+ | " | ||
+ | description: | ||
+ | dependencies: | ||
+ | - anaconda/ | ||
+ | actions: | ||
+ | - action: source | ||
+ | script: | ||
+ | sh: anaconda-activate-2024.sh | ||
+ | success: 0 | ||
+ | </ | ||
+ | |||
+ | If the '' | ||
+ | |||
+ | < | ||
+ | : | ||
+ | " | ||
+ | description: | ||
+ | dependencies: | ||
+ | - anaconda/ | ||
+ | actions: | ||
+ | - action: source | ||
+ | script: | ||
+ | sh: anaconda-activate-2024.sh | ||
+ | success: 0 | ||
+ | | ||
+ | " | ||
+ | description: | ||
+ | : | ||
+ | </ | ||
+ | |||
+ | With a properly-constructed package definition file, you can now check for your versions of TensorFlow: | ||
+ | |||
+ | <code bash> | ||
+ | [(it_nss: | ||
+ | |||
+ | Available versions in package (* = default version): | ||
+ | | ||
+ | [/ | ||
+ | tensorflow | ||
+ | * 2.12.0: | ||
+ | : | ||
+ | </ | ||
+ | |||
+ | ==== Job scripts ==== | ||
+ | |||
+ | Any job scripts designed to run scripts using this virtualenv should include something like the following toward its end: | ||
+ | |||
+ | < | ||
+ | : | ||
+ | |||
+ | # | ||
+ | # Setup TensorFlow virtualenv: | ||
+ | # | ||
+ | vpkg_require tensorflow/ | ||
+ | |||
+ | # | ||
+ | # Run a Python script in that virtualenv: | ||
+ | # | ||
+ | python3 my_tf_work.py | ||
+ | rc=$? | ||
+ | |||
+ | # | ||
+ | # Do cleanup work, etc.... | ||
+ | # | ||
+ | |||
+ | # | ||
+ | # Exit with whatever exit code our Python script handed back: | ||
+ | # | ||
+ | exit $rc | ||
+ | </ | ||