technical:recipes:mpi4py-in-virtualenv

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
technical:recipes:mpi4py-in-virtualenv [2025-01-31 15:14] – [VALET Package Definition] anitatechnical:recipes:mpi4py-in-virtualenv [2025-01-31 15:29] (current) – [VALET Package Definition] anita
Line 1: Line 1:
 +====== Python Virtual Environments with mpi4py ======
 +
 +Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages.  But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI).  As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications:  often without support for Infiniband communications or Slurm/Grid Engine integration support.  For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster.
 +
 +===== MPI and Conda Variants =====
 +
 +In this example we will build the virtual environment on Farber using the ''openmpi/4.0.5'' version of Open MPI and Anaconda for the virtual environment:
 +
 +<code bash>
 +$ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3
 +Adding dependency `ucx/1.9.0` to your environment
 +Adding package `openmpi/4.0.5` to your environment
 +Adding package `anaconda/5.2.0:python3` to your environment
 +</code>
 +
 +<WRAP center round info 60%>
 +Due to recent announcements regarding Anaconda, and Intel dropping their distribution channel, any documentation referring to Intel's channel will need to be updated.
 +
 +Please use ''conda-forge'' channel for installations.
 +</WRAP>
 +
 +===== Create a Directory Hierarchy =====
 +
 +We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added.  In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist:
 +
 +<code base>
 +$ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102
 +</code>
 +
 +Two things to note:
 +  * As written the directory hierarchy is created in the user's home directory; ''${HOME}'' could be replaced by ''${WORKDIR}/users/myname'', for example, to create it elsewhere.
 +  * The current date is used as a version identifier; using the format ''YYYYMMDD'' promotes simple sorting of the versions from oldest to newest.
 +The directory structure will lend ''my-sci-app'' to straightforward management using VALET.
 +
 +===== Farber =====
 +
 +==== Create the Virtual Environment ====
 +
 +The virtual environment is first populated with all packages that **do not** require mpi4py.  Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment.  In this example, neither Numpy nor Scipy require mpi4py.
 +
 +<WRAP center round important 60%>
 +The two channel options are present to ensure only the default Anaconda channels are consulted -- otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues!
 +</WRAP>
 +
 +<code bash>
 +$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy
 +Solving environment: done
 +    :
 +Proceed ([y]/n)? y
 +    :
 +Preparing transaction: done
 +Verifying transaction: done
 +Executing transaction: done
 +#
 +# To activate this environment, use:
 +# > source activate /home/1001/conda-envs/my-sci-app/20201102
 +#
 +# To deactivate an active environment, use:
 +# > source deactivate
 +#
 +</code>
 +
 +Before building and installing mpi4py the environment needs to be activated:
 +
 +<code bash>
 +$ source activate /home/1001/conda-envs/my-sci-app/20201102
 +(/home/1001/conda-envs/my-sci-app/20201102)$ 
 +</code>
 +
 +==== Building mpi4py ====
 +
 +With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.
 +
 +<code base>
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py
 +Collecting mpi4py
 +  Using cached mpi4py-3.0.3.tar.gz (1.4 MB)
 +Skipping wheel build for mpi4py, due to binaries being disabled for it.
 +Installing collected packages: mpi4py
 +    Running setup.py install for mpi4py ... done
 +Successfully installed mpi4py-3.0.3
 +</code>
 +
 +The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source.  The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later).  The environment now includes support for mpi4py linked against the ''openmpi/4.0.5'' library on Farber:
 +
 +<code bash>
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
 +mpi4py      3.0.3
 +</code>
 +
 +Additional packages that require mpi4py can now be installed into the environment.
 +
 +==== VALET Package Definition ====
 +
 +The new virtual environment can easily be added to your login shell and job runtime environments using VALET.  First, ensure you have your personal VALET package definition directory present:
 +
 +<code bash>
 +$ mkdir -p ${HOME}/.valet
 +$ echo ${HOME}/conda-envs/my-sci-app
 +/home/1001/conda-envs/my-sci-app
 +</code>
 +
 +Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_json'' and add the following text to it:
 +
 +<code json>
 +{   "my-sci-app": {
 +        "prefix": "/home/1001/conda-envs/my-sci-app",
 +        "description": "Some scientific app project in Python",
 +        "standard-paths": false,
 +        "actions": [
 +            { "action": "source", "order": "failure-first", "success": 0,
 +              "script": { "sh": "anaconda-activate.sh" }
 +            }
 +        ],
 +        "versions": {
 +            "20201102": {
 +                "description": "environment built Nov 2, 2020",
 +                "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ]
 +            }
 +        }
 +    }
 +}
 +</code>
 +
 +Please note:
 +  - The ''prefix'' path will be different for you
 +  - We do not need to tell VALET the full path to each version; the version identifier **is** the subdirectory or ''prefix'' containing that version
 +  - If you choose a different version of Open MPI or Anaconda, alter the ''dependencies'' list accordingly
 +  - New versions of this project are appended to the ''versions'' dictionary:<code bash>
 +        "versions": {
 +            "20201102": {
 +                "description": "environment built Nov 2, 2020",
 +                "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ]
 +            },
 +            "20201114": {
 +                "description": "environment built Nov 14, 2020",
 +                "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ]
 +            }
 +        }
 +</code>
 +
 +=== Using the Virtual Environment ===
 +
 +The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command:
 +
 +<code bash>
 +$ vpkg_versions my-sci-app
 +Available versions in package (* = default version):
 +
 +[/home/1001/.valet/my-sci-app.vpkg_json]
 +my-sci-app  Some scientific app project in Python
 +* 20201102  environment built Nov 2, 2020
 +</code>
 +
 +Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts):
 +
 +<code bash>
 +$ vpkg_require my-sci-app/20201102
 +Adding dependency `ucx/1.9.0` to your environment
 +Adding dependency `openmpi/4.0.5` to your environment
 +Adding dependency `anaconda/5.2.0:python3` to your environment
 +Adding package `my-sci-app/20201102` to your environment
 +(/home/1001/conda-envs/my-sci-app/20201102)$ which python3
 +~/conda-envs/my-sci-app/20201102/bin/python3
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
 +mpi4py      3.0.3
 +$ which mpirun
 +/opt/shared/openmpi/4.0.5/bin/mpirun
 +</code>
 +
 +===== Caviness =====
 +
 +The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]].  We will instead use the Intel Python distribution:
 +
 +<code bash>
 +$ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02
 +Adding dependency `libfabric/1.13.2` to your environment
 +Adding dependency `binutils/2.35` to your environment
 +Adding dependency `gcc/12.1.0` to your environment
 +Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment
 +Adding package `anaconda/2024.02` to your environment
 +</code>
 +
 +==== Create the Virtual Environment ====
 +
 +The virtual environment is first populated with all packages that **do not** require mpi4py.  Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment.  In this example, neither Numpy nor Scipy require mpi4py.
 +
 +<code bash>
 +$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy
 +Collecting package metadata (current_repodata.json): done
 +Solving environment: done
 +    :
 +Proceed ([y]/n)? y
 +    :
 +#
 +# To activate this environment, use
 +#
 +#     $ conda activate /home/1001/conda-envs/my-sci-app/20201102
 +#
 +# To deactivate an active environment, use
 +#
 +#     $ conda deactivate
 +
 +</code>
 +
 +Before building and installing mpi4py the environment needs to be activated:
 +
 +<code bash>
 +$ conda activate /home/1001/conda-envs/my-sci-app/20201102
 +(/home/1001/conda-envs/my-sci-app/20201102)$ 
 +</code>
 +
 +==== Building mpi4py ====
 +
 +With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. Due to Anaconda trying to use a version of ''ld'' as part of the virtual environment in lieu of the system ''ld'', you need to change the permissions to allow the compile to work properly.
 +
 +<code base>
 +(/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py
 +Collecting mpi4py
 +  Using cached mpi4py-4.0.1.tar.gz (466 kB)
 +Skipping wheel build for mpi4py, due to binaries being disabled for it.
 +Installing collected packages: mpi4py
 +    Running setup.py install for mpi4py ... done
 +Successfully installed mpi4py-4.0.1
 +</code>
 +
 +The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source.  The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later).  The environment now includes support for mpi4py linked against the ''openmpi/4.1.4:gcc-12.1.0'' library on Caviness:
 +
 +<code bash>
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
 +mpi4py      4.0.1
 +</code>
 +
 +Additional packages that require mpi4py can now be installed into the environment.
 +
 +==== VALET Package Definition ====
 +
 +The new virtual environment can easily be added to your login shell and job runtime environments using VALET.  First, ensure you have your personal VALET package definition directory present:
 +
 +<code bash>
 +$ mkdir -p ${HOME}/.valet
 +$ echo ${HOME}/conda-envs/my-sci-app
 +/home/1001/conda-envs/my-sci-app
 +</code>
 +
 +Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it:
 +
 +<code yaml>
 +my-sci-app:
 +    prefix: /home/1001/conda-envs/my-sci-app
 +    description: Some scientific app project in Python
 +    flags:
 +        - no-standard-paths
 +    actions:
 +        - action: source
 +          script:
 +              sh: anaconda-activate.sh
 +          order: failure-first
 +          success: 0
 +    versions:
 +          "20201102":
 +              description: environment built Nov 2, 2020
 +              dependencies:
 +                  - openmpi/4.1.4:gcc-12.1.0
 +                  - anaconda/2024.02
 +</code>
 +
 +=== Using the Virtual Environment ===
 +
 +The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command:
 +
 +<code bash>
 +$ vpkg_versions my-sci-app
 +
 +Available versions in package (* = default version):
 +
 +[/home/1001/.valet/my-sci-app.vpkg_yaml]
 +my-sci-app  Some scientific app project in Python
 +* 20201102  environment built Nov 2, 2020
 +</code>
 +
 +Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts):
 +
 +<code bash>
 +$ vpkg_require my-sci-app/20201102
 +Adding dependency `libfabric/1.13.2` to your environment
 +Adding dependency `binutils/2.35` to your environment
 +Adding dependency `gcc/12.1.0` to your environment
 +Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment
 +Adding package `anaconda/2024.02` to your environment
 +Adding package `my-sci-app/20201102` to your environment
 +(/home/1001/conda-envs/my-sci-app/20201102)$ which python3
 +~/conda-envs/my-sci-app/20201102/bin/python3
 +(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
 +mpi4py      4.0.1
 +$ which mpirun
 +/opt/shared/openmpi/4.1.4:gcc-12.1.0/bin/mpirun
 +</code>
 +
 +===== DARWIN =====
 +
 +The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]].  We will instead use the Intel oneAPI Python distribution:
 +
 +<code bash>
 +$ vpkg_require openmpi/4.1.5:gcc-12.2 anaconda/2024.02
 +Adding dependency `gcc/12.2.0` to your environment
 +Adding dependency `ucx/1.13.1` to your environment
 +Adding package `openmpi/4.1.5:gcc-12.2` to your environment
 +Adding package `anaconda/2024.02:python3` to your environment
 +</code>
 +
 +==== Create the Virtual Environment ====
 +
 +The virtual environment is first populated with all packages that **do not** require mpi4py.  Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment.  In this example, neither Numpy nor Scipy require mpi4py.
 +
 +<code bash>
 +$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20250121 --channel defaults --override-channels python'=>3.7' numpy scipy
 +Collecting package metadata (current_repodata.json): done
 +Solving environment: done
 +    :
 +Proceed ([y]/n)? y
 +    :
 +#
 +# To activate this environment, use
 +#
 +#     $ conda activate /home/1006/conda-envs/my-sci-app/20250121
 +#
 +# To deactivate an active environment, use
 +#
 +#     $ conda deactivate
 +
 +</code>
 +
 +Before building and installing mpi4py the environment needs to be activated:
 +
 +<code bash>
 +$ conda activate /home/1006/conda-envs/my-sci-app/20250121
 +(/home/1006/conda-envs/my-sci-app/20250121)$
 +</code>
 +
 +==== Building mpi4py ====
 +
 +With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.
 +
 +<code base>
 +(/home/1006/conda-envs/my-sci-app/20250121)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld
 +(/home/1006/conda-envs/my-sci-app/20250121)$ pip install --no-binary :all: --compile mpi4py
 +$ pip install --no-binary :all: --compile mpi4py
 +Collecting mpi4py
 +  Downloading mpi4py-4.0.1.tar.gz (466 kB)
 +  Installing build dependencies ... done
 +  Getting requirements to build wheel ... done
 +  Installing backend dependencies ... done
 +  Preparing metadata (pyproject.toml) ... done
 +Building wheels for collected packages: mpi4py
 +  Building wheel for mpi4py (pyproject.toml) ... done
 +  Created wheel for mpi4py: filename=mpi4py-4.0.1-cp313-cp313-linux_x86_64.whl size=997834 sha256=b09b4fe26c8aa940bdcbdb512960fb73edb9ed9ed698b9455db3e1f3d5b078a5
 +  Stored in directory: /home/1006/.cache/pip/wheels/27/79/62/f500b54e8b8ce5f5e54e7b84e8695938988ca274117d39983b
 +Successfully built mpi4py
 +Installing collected packages: mpi4py
 +Successfully installed mpi4py-4.0.1
 +</code>
 +
 +The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source.  The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later).  The environment now includes support for mpi4py linked against the ''''openmpi/4.1.4:gcc-12.2.0'''' library on DARWIN:
 +
 +<code bash>
 +(/home/1006/conda-envs/my-sci-app/20250121)$ pip list | grep mpi4py
 +mpi4py      4.0.1
 +</code>
 +
 +Additional packages that require mpi4py can now be installed into the environment.
 +
 +==== VALET Package Definition ====
 +
 +The new virtual environment can easily be added to your login shell and job runtime environments using VALET.  First, ensure you have your personal VALET package definition directory present:
 +
 +<code bash>
 +$ mkdir -p ${HOME}/.valet
 +$ echo ${HOME}/conda-envs/my-sci-app
 +/home/1006/conda-envs/my-sci-app
 +</code>
 +
 +Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it:
 +
 +<code yaml>
 +my-sci-app:
 +    prefix: /home/1006/conda-envs/my-sci-app
 +    description: Some scientific app project in Python
 +    flags:
 +        - no-standard-paths
 +    actions:
 +        - action: source
 +          script:
 +              sh: anaconda-activate.sh
 +          order: failure-first
 +          success: 0
 +    versions:
 +          "20250121":
 +              description: environment built Jan 21, 2025
 +              dependencies:
 +                  - openmpi/4.1.5:gcc-12.2
 +                  - anaconda/2024.02
 +</code>
 +
 +=== Using the Virtual Environment ===
 +
 +The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command:
 +
 +<code bash>
 +$ vpkg_versions my-sci-app
 +
 +Available versions in package (* = default version):
 +
 +[/home/1006/.valet/my-sci-app.vpkg_yaml]
 +my-sci-app  Some scientific app project in Python
 +* 20250125  environment built Jan 21, 2025
 +</code>
 +
 +Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts):
 +
 +<code bash>
 +$ vpkg_require my-sci-app/20250121
 +Adding dependency `gcc/12.2.0` to your environment
 +Adding dependency `ucx/1.13.1` to your environment
 +Adding dependency `openmpi/4.1.5:gcc-12.2` to your environment
 +Adding dependency `anaconda/2024.02:python3` to your environment
 +Adding package `my-sci-app/20250121` to your environment
 +(/home/1006/conda-envs/my-sci-app/20250121)$ which python3
 +~/conda-envs/my-sci-app/20250121/bin/python3
 +(/home/1006/conda-envs/my-sci-app/20250121)$ which mpirun
 +/opt/shared/openmpi/4.1.5-gcc-12.2/bin/mpirun
 +</code>