Department of Physics and Astronomy

The Forbes Group

Conda Pip and All That

In this post I describe my strategy for managing source code projects and dependencies with python, Conda, [Pip][], etc. I also show how to to maintain your own "PyPI" for source code packages.


Current Strategy

:::{margin} Just make sure you use dependencies: rather than packages: in anaconda-project.yaml. This will allow conda env to also use the file. ::: Our current strategy is to manage conda environments on a per-project basis using Anaconda Project. Carefully structured, the associated anaconda-project.yaml file can act as a replacement for environment.yml files. This can take more disk space than using monolithic work environments like we used to do, but we mitigate this with some of the following principles:

  • Don't include build tools such as [Poetry][], [Black][], [Mercurial][] etc. in the environment as this can cause dependency issues and bloat. Instead, install these with e.g. [PipX][]. I do the following:

    pipx install poetry
    pipx install git+
    pipx install black
    pipx install nox
    pipx install poetry2conda
    pipx install conda-lock
    pipx install condax
    pipx install rst-to-myst
    pipx install twine
    pipx install mercurial
    pipx install mmf_setup
    pipx inject nox nox-poetry poetry 
    pipx inject mercurial hg-git hg-evolve
  • Don't included the conda-forge channel by default. Specify it only when needed like conda-forge::myst-nb. This generally keeps the search/build times small, but can have issues with complicated dependencies that require pinning other requirements as coming from conda-forge. It is fine to fallback and include the conda-forge channel, but keep in mind that creating the environment may take much longer. (Mamba can help in these cases.)

  • Consider using [Conda][] proper to manage only those packages that are complicated, difficult to build, or have binary dependencies (things like numpy, scipy, matplotlib, numba etc.), and then use Pip or [Poetry][] to manage the rest. This works out of the box with Pip, but needs a little bit of help with [Poetry][]. We use the additional functionality of adding commands in anaconda-project.yaml to do any additional provisioning (such as installing IPython kernels for use in Jupyter).

  • To start a new project, take a look at using our cookiecutters. These provide templates for new projects with a working skeleton anaconda-project.yaml file with support for documentation hosted on Read the Docs (e.g. Physics 555: Quantum Technologies and Computation), a Makefile for initializing environments on CoCalc, automatic testing, CI, and many other features.

If you need to install new packages, here are some options:

  1. If the packages you need are pip-installable, then you can install them locally in PYTHONUSERBASE. This defaults to ~/.local/:

    conda activate work
     python -m pip install --user ...

    For example, with Python 3.8 on linux, this will install the packages in ~/.local/lib/python3.8/ with binaries in ~/.local/bin/ etc. This is convenient, but does not completely isolate your environment – any version of python3.8 will have access to these. (A potential issue is that the installed packages might depend on libraries in the underlying conda environment, which may then be unavailable in other environments.)

    A safer approach is to use a virtual environment (venv) with access to the system site packages – i.e. those in the conda environment:

    cd <your project>
     conda activate work
     python3 -m venv --system-site-packages .venv

    This will create a local virtual environment in .venv in which any new packages will be installed, but which will still have access to the packages in the work conda environment.


    • It can be a little confusing having both conda and venv's active. You should be careful to activate the venv only after any conda environment, or without any conda environments active. If you activate a conda environment after activating the venv, things can get quite messed up.
    • I don't have a good solution yet for making a completely reproducible environment file: conda list will not find packages installed in the venv. You can list these with pip list or pip freeze, but these will not include those in the conda environment. This will be addressed below when we discuss poetry.
  2. Unfortunately, this will not work with packages that need conda to install. If you have such additional dependencies, then your best option is to create a clone:

    conda create -n mywork --clone work
     conda activate mywork
     conda install ...


There are a few issues with our suggested approach of using read-only conda environments:

  • conda run cannot be used when the environment is read-only: Conda issue #10690. However, many programs can simply be run by symlinking the appropriate executable somewhere like ~/.local/bin/ on your path.
  • Cloning environments can take quite a bit of memory. In principle, hardlinks should be used, but not everything can be linked... using a single environment would take less memory.

(To suggest additional packages, please create a topic and merge request in the forbes-group/configurations repository as discussed below.)


  • Common environments are available for quickly getting down to work.
  • Environments stay clean, promoting reducibility.


There are two main domains of python packaging:

  1. Users who need to install packages.
  2. Developers who need to test and distribute packages.

In both cases, one would like to be able to work in reproducible computing environments, possibly working with a variety of different python versions. In any case, do not use or update the version of python installed for use by the system: this is asking for problems.

Instead, use one of the following strategies:

  • Conda: Provides an easy way of installing and activating independent python environments. In addition to python, conda is a full binary package manager able to install many tools and libraries beyond the python ecosystem. The main disadvantage of conda is that it can be very slow to install a new environment. To some extent, mamba.

    Pros: Cons:

  • pyenv: Pros:



As a general strategy, I am now trying to host everything on Anaconda Cloud so that all my packages and environments can be installed with Conda.

Within the Python community, the Python Packaging Authority (PyPA) provides the recommendations of the current working group, and we try to follow their guidelines as laid out in the Python Packaging User Guide.

My general strategy is to maintain Conda environments for each project and each main application, and to rely on these to ensure my projects work and can be easily distributed. In addition, I maintain a common work environment that has everything and the kitchen sink which I can "just use" to get work done. This is hosted on Anaconda Cloud so you can install it and use it with:

conda install anaconda-client
conda env create mforbes/work    # Don't run in a directory with an environment.yml file
conda activate work

Note: do not run this in a folder with an environment.yml file. See Issue #549.

This notebook describes some details about this strategy which consists of the following concepts:

  • Jupyter: This section discusses the jupyter environment where I install Jupyter and related tools.
    • I use a script to launch the Jupyter notebook server and/or Jupyter Lab so that this environment is first activated.
    • I use the nb_conda extension so that one can access the conda environments from Jupyter. (This requires ensuring that the ipykernel package is installed in each environment so that they can be found.)
    • I use the Jupytext extension so that you can store notebooks as python files for better version control. This requires a small manual change in your notebooks to enable.
  • Conda Environments: I try to package everything into Conda environments. There are several use-cases here:

    • Global environments such as work2 and work3 which have everything including the kitchen sink for python 2 and python 3 respectively. When I just need to get something done, I activate these and work. Don't rely on these for testing though: instead use minimal and dedicated environments for your code.
    • Dedicated and minimal project environments should be used for each project that contain the minimal number of packages required. Your tests should be run in these environments (plus any testing tools) so that you can be sure no undocumented dependencies creep in when you add new features. I usually keep two copies:

      • environment.yml: This contains the packages I need (leaves), usually without version specifications unless there is a known conflict. It can be kept clean as descibed in the section Visualizing Dependencies. To create or update the environment:

        conda env update -f environment.yml
      • environment.frozen.yml: This contains the explicit versions I am testing against to ensure reproducible computations. It can be formed by:

        conda activate <environment name>
         conda env export > environment.frozen.yml
  • Anaconda Cloud: I try to use my mforbes channel on Anaconda Cloud to host useful environments and packages. Ideally, all packages should be installable with only Conda. Some information about how to build and maintain Conda packages and environments is discussed in section Building a Conda Package.



  • Use conda to manage environments. (Conda can be slow: for faster installs you can try mamba, but it is still experimental.)
  • For each project, create an environment which will contain all the packages like NumPy, SciPy etc. needed for that project. (Also specify python 2 or python 3). Store these requirements in a project-specific environment.yml file.
  • Create an environment for that project that includes the ipykernel package:

    conda env create --file environment.yml

    (This assumes that environment.yml specifies the name of the environment, otherwise you need to pass conda the -n <env> flag).

  • Install Jupyter (see instructions below) in its own environment and include the nb_conda package. By including the ipykernel package in each environment and the nb_conda package in the jupyter environment, Jupyter will be able to find your environments and use them as kernels.
  • Launch Jupyter with a custom kernel for your project using the following function which will activate the jupyter environment first:

    # ~/.bashrc or similar
    function j { 
        if [ -f './' ]; then
            if [ -f "$(hg root)/" ]; then
                CONFIG_FLAG="--config=$(hg root)/";
        echo "conda activate jupyter";
        conda activate jupyter;
        echo "jupyter notebook ${CONFIG_FLAG} $*";
        jupyter notebook "${CONFIG_FLAG}" "$*";
        conda deactivate
  • For development and testing, create a set of minimal environments with the versions you want to test against:

    envdir="$(conda info | grep 'envs directories' | awk 'NF>1{print $NF}')"
    for p in 2.7 3.5 3.6 3.7 3.8 3.9; do
      sudo chown mforbes "${envdir}/${env}"
      conda env remove -y -n ${env} 2> /dev/null
      rm -rf "${envdir}/${env}"
      conda create -f -y -c defaults --override-channels \
            --no-default-packages -n "${env}" python=${p}
      sudo chown admin "${envdir}/${env}"
      ln -s "${envdir}/${env}/bin/python${p}" ~/.local/bin/
    conda clean --all -y


The following recommendations are based on using pip to manage dependencies. We are in the process of migrating to a more conda-centric version.

  • Provide a file for your source repositories that contains all of the dependency information in the install_requires argument to setup().
  • Structure your repository with tags and/or named branches so that you can hg update 0.9 etc. to update to the various versions.
  • Host an index.html file somewhere that points to the Download links of the various projects.
  • Use pip install --extra-index-url <index.html> to allow pip to resolve dependencies against your source code.

Poetry (Trial)

I am giving a poetry a try. This is a package for replacing providing virtual environments for development and testing.


conda deactivate    # Optional: deactivate conda environments
curl -sSL | python
mkdir -p ~/.local/share/bash-completion/completions
poetry completions bash > ~/.local/share/bash-completion/completions/poetry.bash

This requires a modern version of bash. See issues #3418. I use

port install bash-completion
sudo echo /opt/local/bin/bash >> /etc/shells
chsh -s /opt/local/bin/bash
echo >> ~/.bashrc <<EOF
if [ -f /opt/local/etc/profile.d/ ]; then
  . /opt/local/etc/profile.d/

To uninstall:

POETRY_UNINSTALL=1 bash -c 'curl -sSL | python'


Follow the instructions to setup your project using a pyproject.toml file instead of If you still want to be able to use source-installs (pip install -e), then make a stub file:

import setuptools


When you want to develop the project, you should first set the python environment. If you follow my advice above, you will have various python versions available on your PATH such as python3.6, python3.7 etc. Get poetry to create a virtual environment with your chose as follows:

$ poetry env use python2.7 
Creating virtualenv wdata-6Y3wHwFr-py2.7 in ...Caches/pypoetry/virtualenvs
Using virtualenv: .../Caches/pypoetry/virtualenvs/wdata-6Y3wHwFr-py2.7
$ poetry env use python3.9
Creating virtualenv wdata-6Y3wHwFr-py3.9 in .../Caches/pypoetry/virtualenvs
Using virtualenv: .../Caches/pypoetry/virtualenvs/wdata-6Y3wHwFr-py3.9

Now you can do things like install the project in this environment, or run tests etc. For example, testing the packaged against Python 3.9 would be accomplished by:

poetry env use python3.9
poetry install
poetry run pytest


The following works with nox:

import nox
from nox.sessions import Session

@nox.session(python=["3.6", "3.7", "3.8", "3.9"])
def test(session: Session) -> None:
    """Run the test suite.""""poetry", "env", "use", 
                session.virtualenv.interpreter, external=True)"poetry", "install", external=True)"poetry", "run", "pytest", external=True)

There may be better ways of getting this to work.


  • poetry config --list
  • poetry shell: Start a shell
  • poetry config repositories.testpypi
  • rm -rf "$(poetry config virtualenvs.path)": Remove virtual envs.
  • rm -rf "$(poetry config cache-dir)": Remove entire cache of virtual envs etc.



Conda and Anaconda provide quite complete python distributions for scientific computing. Unless I am specifically developing packages for general us, I allow myself to rely on the default Anaconda distribution. This includes the Conda package manager, which provides an alternative to virtualenv for managing environments.

There is a nice discussion of the state of conda here:

Conda has a few important benefits:

  • Many binary packages are supported, including some that are not python (and hence cannot be managed by pip).
  • Conda including builds of NumPy, SciPy, and Matplotlib that can be difficult to install from source.
  • Conda keeps track of your history (try conda list --revisions) which can help if you need to reproduce some history.
  • For educational use (or paid commercial use), one can install the Accelerate package.
  • Conda is aware of packages installed with pip so these can still be used.
  • Conda provides environments, replacing the need for virtualenv.


Working with packages:

  • conda update --all: Update all packages in an environment.
  • conda clean --all: Remove all downloaded packages and unused files. Can free significant amounts of disk space.

Working with environments:

  • conda activate <env>/conda deactivate: Activate or deactivate an environment.
  • conda env list: Show installed environments (and their location).
  • conda env update [ -n name ] ( -f environment.yml | channel/spec ) [ --prune ]: Update packages in an environment from a file or anaconda channel, optionally removing unnecessary files (but see this issue.)

Clone an environment:

  • Using conda-pack which must be installed (conda install -c conda-forge conda-pack):
    env=work            # Specify the name here
    conda pack -n "${env}" -o "${env}.tgz"
    tar -zxvf "$env.tgz" -C new_env_folder
    conda-unpack -p new_env_folder
    mamba install --force-reinstall olefile defusedxml

Package Availablity

One issue when dealing with conda is that not all packages are available. There several ways to deal with this:

  1. Search for the package on another channel. The following channels seem to be reliable:

    • [conda-forge][]: This is a community driven project that provides an infrastructure for building and maintaining packages cleanly on Windows, Linux, and OS X. It is described in the post - Community powered conda packaging: conda-forge.
    • pipy To add a new channel:
    conda config --append channels conda-forge

    Then you can look with

    $ conda config --show
    add_anaconda_token: True
    add_pip_as_python_dependency: True
    allow_softlinks: True
    always_copy: False
    always_yes: False
    auto_update_conda: True
    binstar_upload: None
    changeps1: True
    channel_priority: True
    - defaults
    - conda-forge
    create_default_packages: []
    debug: False
    disallow: []
    json: False
    offline: False
    proxy_servers: {}
    quiet: False
    shortcuts: True
    show_channel_urls: True
    ssl_verify: True
    track_features: []
    update_dependencies: True
    use_pip: True
    verbosity: 0
  2. Build your own conda package and upload it to Anaconda Cloud. It seems that contributing to [conda-forge][] is the preferred way to do this but I have not yet tried.

  3. Use pip. The disadvantage here is that packages are then managed by both conda and pip which can lead to some confusion, but this does work reasonably well and is how I used to do things for many years. The advantage is that you can specify your dependences simply with the standard file. I am not sure how to do this with conda yet.


Getting Started


My strategy is to install a minimal conda installation with python 2.0 and then add various environments as needed. Each project should have its own conda environment so that one can reproducibly manage the dependencies of that project. This ties together as follows:

  • The conda base environment and some custom environments (jupyter might be a good candidate) can be maintained at a system level by an admin. On my computers, these reside in:

    /data/apps/conda/   # System (mini)conda directory
    |-- envs/           # System-defined environments 
    |   |-- jupyter/    # Jupyter notebooks etc. with nb_conda extension
    |   `-- ...
    |-- bin/            # Executables for `base` environment
    `-- ...
  • The jupyter environment contains the nb_conda extension which allows one to choose kernels based on known conda environments. With this, you can run jupyter from the jupyter environment and use a specialize kernel for whatever computation you need. (For example, jupyter should be run with Python 3, but you may have kernels that are still Python 2. This approach means you do not need to include jupyter in you project environment.)

  • On a system where users cannot write to /data/apps/conda, environments created by the user will be automatically placed in:


    Thus, users can immediately create their own environments as needed.

  • If a user needs to create an environment in ~/.conda/envs/ that shadows an environment in /data/apps/conda/envs, then it seems this can be done by first making the appropriate directory, then using conda to install whatever is needed:

    $ conda create -n jupyter
     CondaValueError: prefix already exists: /data/apps/conda/envs/jupyter
     $ mkdir -p ~/.conda/envs/jupyter
     $ conda install -n jupyter ...

    This is probably a bit of a hack, but seems to work.


I first install Miniconda with Python 2.7.

  • I choose to use python 2.7 here because it allows me to install Mercurial which will likely not support Python 3 for some time.
  • I add the miniconda bin directory to the end of my path so that the hg command will fall through if I have activated a python 3 environment.
  • Other than mercurial and related tools, I keep the default environment clean.

Update Conda

After installing miniconda, I do an update:

One interesting feature shown here is that conda keeps track of your revisions. You can update the revisions with conda install --revision=0 (the original miniconda installation for example) and you can list them with:

To ensure reproducible computing, I highly recommend working from a clean environment for your projects so that you can enter a clean environment with

conda activate _conda_env

To do this, maintain an environment.yml file such as this:

# environment.yml
name: _conda_env
  - defaults
  - conda-forge
  - ioam
  - ipykernel        # Use with nb_conda_kernels so jupyter sees this kernel
  - numpy
  - scipy>=0.17.1
  - matplotlib>=1.5
  - uncertainties
  - ipywidgets
  - holoviews
  - pip:
    - mmf_setup>=0.1.11

Then create a new environment as follows:

conda env create -p _conda_env -f environment.yml

Periodically you should check if anything is outdated:

conda search -p _conda_env --outdated

and possibly update everything

conda update -p _conda_env --all

If you then make a _conda_env environment in the project and choose this as the kernel for your notebooks, you can then run in an isolated environment. Don't forget to exclude _cond_env from selective sync with Dropbox etc. as it can get big! For this reason, it can be better to install this globally.


For my work environment, I like to have everything in the anaconda meta-package, but this precludes me pinning python to a preferred version. For example - the following gives an error:

# environment.tst.yml
name: tst
  - defaults
  - conda-forge
  - python == 3.8.6   # Try >= 3.8.5
  - anaconda  # Gets most things: we usually want to depend on this.

Conda fails to built this, while Mamba builds a version that breaks on OS X:

mamba env create -f environment.tst.yml
conda activate -n tst && python -c "import scipy"

Being less restrictive works.

Frozen Environments

I manually create environment files like the one above which specify the leave nodes I need. To prune these, see the section Visualizing Dependencies:Conda

I keep a few clean environments for use when testing. I make these as another user (admin) so that I can't accidentally modify the environments.

In [ ]:
. deactivate 2> /dev/null
conda update conda
envdir="$(conda info | grep 'envs directories' | awk 'NF>1{print $NF}')"

for p in 2.7 3.5 3.6 3.7 3.8 3.9; do
  echo conda env remove -y -n ${env} 2> /dev/null
  echo conda create -y -c defaults --override-channels --no-default-packages -n "${env}" python=${p}
  echo sudo chown admin "${envdir}/${env}"
  echo ln -s "${envdir}/${env}/bin/python${p}" ~/.local/bin/
conda clean --all -y

# This takes about 600MB

for p in 2.7 3.5 3.6 3.7 3.8 3.9; do
  conda env remove -y -n ${env} 2> /dev/null
  conda create -y -n ${env} python=${p} anaconda

# This take more tha 8GB... I gave up.

If I want to test against a clean environment, I make a clone:

In [25]:
conda create -m -p _envs/py2.6 --clone py2.6
. activate _envs/py2.6
. deactivate
rm -rf _envs
src_prefix: '/data/apps/anaconda/1.3.1/envs/py2.6'
dst_prefix: '/Users/mforbes/current/blog/Nikola/mmfblog/posts/_envs/py2.6'
Packages: 8
Files: 0
Fetching package metadata: ..........
Linking packages ...
[      COMPLETE      ]|##################################################| 100%
# To activate this environment, use:
# $ source activate /Users/mforbes/current/blog/Nikola/mmfblog/posts/_envs/py2.6
# To deactivate this environment, use:
# $ source deactivate
discarding /data/apps/anaconda/1.3.1/bin from PATH
prepending /Users/mforbes/current/blog/Nikola/mmfblog/posts/_envs/py2.6/bin to PATH
discarding /Users/mforbes/current/blog/Nikola/mmfblog/posts/_envs/py2.6/bin from PATH

Copying Environments

Conda environments are not relocatable. Even though everything lives in a single folder, moving that is asking for trouble. Here are some strategies:


BROKEN: When I tried this, SciPy failed:

In [1]: import scipy
ImportError                               Traceback (most recent call last)
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/ in <module>
     21 try:
---> 22     from . import multiarray
     23 except ImportError as exc:

/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/ in <module>
---> 12 from . import overrides
     13 from . import _multiarray_umath

/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/ in <module>
----> 7 from numpy.core._multiarray_umath import (
      8     add_docstring, implement_array_function, _get_implementing_args)

ImportError: dlopen(/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/, 2): Library not loaded: @rpath/libopenblas.dylib
  Referenced from: /data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/
  Reason: image not found

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-1-4363d2be0702> in <module>
----> 1 import scipy

/data/apps/conda/envs/work_/lib/python3.8/site-packages/scipy/ in <module>
     59 __all__ = ['test']
---> 61 from numpy import show_config as show_numpy_config
     62 if show_numpy_config is None:
     63     raise ImportError(

/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/ in <module>
    138     from . import _distributor_init
--> 140     from . import core
    141     from .core import *
    142     from . import compat

/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/ in <module>
     46 """ % (sys.version_info[0], sys.version_info[1], sys.executable,
     47         __version__, exc)
---> 48     raise ImportError(msg)
     49 finally:
     50     for envkey in env_added:



Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was

We have compiled some common reasons and troubleshooting tips at:

Please note and check the following:

  * The Python version is: Python3.8 from "/data/apps/conda/envs/work_/bin/python"
  * The NumPy version is: "1.19.2"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: dlopen(/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/, 2): Library not loaded: @rpath/libopenblas.dylib
  Referenced from: /data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/
  Reason: image not found

Allows you to copy the environment. Install with conda install -c conda-forge conda-pack.

env=work                     # Specify the name here
new_penv=new_env_folder  # It will go here
conda pack -n "${env}" -o "${env}.tgz"
mkdir "${new_penv}"
tar -zxvf "$env.tgz" -C "${new_penv}"
conda activate "${new_penv}"


  • Packages installed with pip -e in source mode cannot be copied. Install these properly or remove with pip uninstall.
  • Fails on Mac OS X with and some other packages:

    Files managed by conda were found to have been deleted/overwritten in the following packages:

    • olefile='0.46'
    • defusedxml='0.6.0'

    This is usually due to pip uninstalling or clobbering conda managed files, resulting in an inconsistent environment. Please check your environment for conda/pip conflicts using conda list, and fix the environment by ensuring only one version of each package is installed (conda preferred).

    In principle, this should be fixed by forcing these to be reinstalled:

    mamba install -n "${env}" --force-reinstall olefile defusedxml

    but this does not fix For this, uninstall, pack, move, then reinstall:

    #mamba uninstall --force -n "${env}"
    #Fails: see
    conda uninstall --force -n "${env}"
    mamba install -p "${new_penv}"
conda list --explicit > spec_file.txt


I install Jupyter into it's own environment and then use the following alias to launch it. This allows me to launch it in any environment without having to install it. To use those environments as their own kernel, make sure you include the ipykernel package in the environment and then the nb_conda package in the jupyter environment.

conda env create --file environment.jupyter.yml

Sometimes I have difficulty getting extensions working. The following may help:

jupyter serverextension enable --sys-prefix jupyter_nbextensions_configurator nb_conda
jupyter nbextension enable --sys-prefix toc2/main init_cell/main
jupyter serverextension disable --sys-prefix nbpresent
jupyter nbextension disable --sys-prefix nbpresent/js/nbpresent.min
# environment.jupyter.yml
name: jupyter
description: |
  Environment for running Jupyter, Jupyter notebooks, Jupyter Lab
  etc. This environment includes the following packages:

  * `nb_conda`: This enables Jupyter to local conda-installed
    environments.  To use this, make sure you install the `ipykernel`
    package in your conda environments.
  * `jupytext`: Allows you to store Jupyter notebooks as text files
    (.py) for better version control.  To enable this for a notebook,
    you can run:

        jupytext --set-formats ipynb,py --sync notebook.ipynb

  * `rise`: Allows you to use your notebooks for presentations.
  - defaults
  - conda-forge
  - python=3
  - nb_conda
  - jupyter_contrib_nbextensions
  - jupyter_nbextensions_configurator
  - ipyparallel
  - nbbrowserpdf
  - nbdime
  - nbstripout
  - jupyterlab
  - nbdime
  - rise
  - jupytext

  # VisPy
  #- ipywidgets
  #- vispy
  #- npm
  #- pip
  #- pip:
  #  -

  # matplotlib
  #- ipympl

  # jupyter nbextension enable --py widgetsnbextension
  # jupyter nbextension enable --py vispy
# ~/.environment_site which is sourced by ~/.bashrc

# New way of activating conda: Use your install location
. /data/apps/anaconda/etc/profile.d/
conda activate          # Optional - activates the base environment.

function j { 
    if [ -f './' ]; then
        if [ -f "$(hg root)/" ]; then
            CONFIG_FLAG="--config=$(hg root)/";
    echo "conda activate jupyter";
    conda activate jupyter;
    echo "jupyter notebook ${CONFIG_FLAG} $*";
    #unset BASH_ENV module ml;
    jupyter notebook "${CONFIG_FLAG}" "$*";
    conda deactivate


To see where Jupyter configuration files go, run

In [9]:
. activate jupyter
jupyter --paths

I add the following kernel so I can seamlessly match with [CoCalc][].

# ~/Library/Jupyter/kernels/python2-ubuntu/kernel.json
 "display_name": "Python 2 (Ubuntu, plain)",
 "argv": [
 "language": "python"

Anaconda Cloud

Once you have useful environments, you can push them to Anaconda Cloud so that you or others can use them:

anaconda login    # If you have not logged in.
anaconda upload

Once this is done, you can install the environment with one of:

conda env create mforbes/work   
mamba env create mforbes/work   # Faster, but experimental


The Conda package cache can help speed the creation of new environments, but can take up quite a bit of disk space. One can reclaim this with conda clean:

In [20]:
!conda clean -p --dry-run
!conda clean -p -y
Dry run: exiting
From the docs: "pyenvpyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well."


  • pyenv install --list: See all available versions.

Basic Packages and Configuration

Here is a list of basic packages that I always use. I install these in two custom environments work2 and work3. I also make a symlink called work which points to the appropriate environment in which I am currently working.

First, however, I install the following in the default conda environment:

Now I add the default path to both the start and end of %PATH with different names in my .bashrc file:

export PATH="/data/apps/anaconda/bin:$PATH:/data/apps/anaconda/bin/."

The reason for the latter addition is:

  1. It will be a fallback when activating other environments so I can still use mercurial.
  2. By giving it a different name it will not be removed when . deactivate is called.

Here is the process for creating my new work environments including some subtle issues:

  • As per issue 280, anaconda's packages for nose and flake8 are broken and do not register the distutils entry point, thus I remove these.
  • Many of these tools are for high performance computation.
Activate and Deactivate Scripts

When sourcing the activate and deactivate scripts, files "$CONDA_ROOT/etc/conda/activate.d/*.sh" and "$CONDA_ROOT/etc/conda/deactivate.d/*.sh" will be sourced. This allows one to perform additional configurations such as setting IPYTHONDIR. To find CONDA_ROOT programatically use:

In [1]:
CONDA_ROOT="$(conda info -e | grep '*' | cut -d '*' -f2 | sed 's/^ *//g')"

For example, the following two scripts to set and reset (once) IPYTHONDIR:

  • /data/apps/anaconda/envs/work/etc/conda/activate.d/
if [ ! -z \${IPYTHONDIR+x} ]; then
export IPYTHONDIR="/data/apps/anaconda/envs/work/.ipython"
  • /data/apps/anaconda/envs/work/etc/conda/deactivate.d/
if [ -z \${OLD_IPYTHONDIR+x} ]; then

I make these files with a script like the following which I can run in the appropriate environment to set the .ipython directory in the current folder as the IPYTHONDIR for the environment. Note: It does some checks to make sure that an environment is active.


CONDA_ROOT="$(conda info --root)"
CONDA_ENV="$(conda info -e | grep '*' | cut -d '*' -f2 | sed 's/^ *//g')"
if [ "$CONDA_ROOT" == "$CONDA_ENV" ]; then
  echo "Not in a conda environment.  Activate the environment first. I.e.:"
  echo "    . activate work"
  exit 1
  echo "Creating activate and deactivate scripts:"
  echo "   $CONDA_ENV/etc/conda/activate.d/"
  echo "   $CONDA_ENV/etc/conda/deactivate.d/"

mkdir -p "$CONDA_ENV/etc/conda/activate.d"
mkdir -p "$CONDA_ENV/etc/conda/deactivate.d"
IPYTHONDIR="$(cd .ipython && pwd)"
cat <<EOF > "$CONDA_ENV/etc/conda/activate.d/"
# This file sets the IPYTHONDIR environmental variable to use the
# settings in $IPYTHONDIR
# It is called when you activate the $CONDA_ENV environment with
#   . activate $CONDA_ENV

if [ ! -z \${IPYTHONDIR+x} ]; then
cat <<EOF > "$CONDA_ENV/etc/conda/deactivate.d/"
# This file resets the IPYTHONDIR environmental to the previous value.
# It is called when you deactivate the $CONDA_ENV environment with
#   . deactivate

if [ -z \${OLD_IPYTHONDIR+x} ]; then
   echo Unsetting IPYTHONDIR

. "$CONDA_ENV/etc/conda/activate.d/"


If you are developing software, the conda-devenv package might help. It allows you to maintain several dependent environment files for example. This might be useful when you want to have separate environment files for testing python 2 and python 3. Useful features include (from the docs):

  • Jinja 2 support: gives more flexibility to the environment definition, for example making it simple to conditionally add dependencies based on platform.
  • include other environment.devenv.yml files: this allows you to easily work in several dependent projects at the same time, managing a single conda environment with your dependencies.
  • Environment variables: you can define a environment: section with environment variables that should be defined when the environment is activated.

Packages and Dependencies

Making sure that you have all of the required packages installed can be a bit of a pain. I have toyed with three different types of strategies outlined below. I currently use method 1. when actively developing, then switch to a version that supports both 2. and 3.

  1. Include the source code for projects you need as weak dependencies managed with the myrepos (mr) tool:

    Basically, I include a .mrconfig file which instructs mr how to checkout the required projects:

    checkout = hg clone '' 'pymmf'

    When I run mr checkout, this project will be cloned to _ext/pymmf and I can then symlink the appropriate files to a top-level module ln -s _ext/pymmf/mmf mmf. If you use pymmf in many projects, you can replace _ext/pymmf with a simlink to an appropriate common location (but keep in mind that each project will be using the same revision then.)

    A small set of additional command can give mr freeze which tracks the specific verions. There are some bugs with this, but you can see an example in my mmfhg project, in particular in the mrconfig file.

    This approach is useful if you need to activately develop both the top level project and the subproject, but requires manual intervention to update etc. and is not suitable for general usage since collaborators will need to add myrepos to their toolchain.

  2. Write a file and use this or pip to manage the dependencies.

    This is a better choice for general distribution and collaboration, but still has a few warts. An advantage of this over the next option is that it will then work with PyPI.

  3. Write a meta.yaml file and use conda to manage dependencies.

    This is a good choice for general distribution if you want your package to be installable with conda from Anaconda Cloud.

Setuptools and Pip

The Python Packaging User Guide is supposed to provide the definitive guide for managing python packages. Its current recommendations for installing packages is to use pip, so we would like to maintain that strategy. The basic idea here is to include a file in your top level directory that specifies your dependencies (and example will be provided below). This is designed for and works well will packages published on PyPI but starts failing when you need to depend on projects that are not published (i.e. those hosted on github or bitbucket.)

Setup(.py) and Requirements(.txt)

Some of the issues are discussed in this rant but are centered around the issue of how to specify the location of code not available on PyPI. Both pip and setuptools support installation from version controlled sources. The recommended approach for this (see Setup vs. Requirements is to put these dependencies in a requirements.txt file, but there are problems with this approach:

  1. Users must run pip install -r requirements.txt explicitly rather than just pip install -e . or python develop.

    This is probably not such a big deal as long as the install instructions are clear.

  2. If you require a subproject that also has requirements.txt, there is no easy way of recursively parsing the subprojects requirements. (The requirements specified in the subprojects will get processed.

    The only solution I know to this second issue is to make sure your requirements.txt file is very complete, specifying all possible requirements recursively.

  3. Suppose that you use branches or tags in your code to mark versions. Now suppose that one project A that depends on another project B, and that B depends on a specific version of C==0.9. You dutifully include this in the files:

    # A's
    install_requires = ["B"]
    # B's
    install_requires = ["C==0.9"]

    However, at this point, neither pip nor setuptools can install A since neither B nor C are installed on PyPI. The recommended solution is to specify all of the dependences in A's requirements.txt file:

    -e hg+ssh://
    -e hg+ssh://
    -e .

    Installing A with pip install -r requirements.txt will now install B and C, but the requirements are now broken. The package C for example will be installed at the latest version, even if this breaks the C==0.9 requirement by B. This appears to be broken even if we provide the explicit versions in the requirements.txt file:

    -e hg+ssh://
    -e hg+ssh://
    -e .

    Thus we are left in the awkward position of having to make sure in project A that we pin the exact requirements we need in subproject C even though we do not (directly) depend on it.

The problem described above is consistent with the following roles of and requirements.txt:

  • The install_requires argument to setup() in specifies what is required and version information should be used to specify what does not work (i.e. avoid this version).
  • The requirements.txt specifies exactly one situation that does work.

In case of conflict, the second point here seems to override the first one (C-0.8 above gets installed even though the setup for B specifies C==0.9).

Thus, requirements.txt is completely useless for specifying dependencies in a general way. The only solution here appears to be to carefully construct a requirements.txt file whenever a project or its dependencies change. The following tools may help with this, but I am not happy with this solution:

A Better Solution: Host your own Index

Can we get the benefits of the usual dependency resolution while using version controlled software? To do this, it seems that you need to host your packages through an index via one of the following approaches:

  • Upload your package to PyPI: This works, but requires a lot of discipline and maintenance to make sure your package is complete, tested, working, and properly described whenever you need to freeze it. The whole release process can be a pain, and submitting a bunch or random tools to PyPI is considered bad practice. (For one thing, there is a global namespace, so you might steal a useful name for a bunch of specialized tools where another package might have been able to better use it for the community.)
  • Host your own Package Index.

The latter approach seems at first to require running a web-server somewhere which is off-putting, but actually supports two quite reasonable solutions. The first is to package tarballs of your projects into a single directory with canonical names like C-0.8.tar.gz, and C-0.9.tar.gz and then point to this with the pip --find-links <dir> option. (The tarballs can be produced with python sdist and will get deposited in the dist directory which you could symlink to the <dir> specified above.)

The second solution is to provide a proper package index somewhere. This allows you to refer to additional locations, and in particular, allows you to refer to something like

<a href="">persist-0.9</a>.

This uses the packaging feature at bitbucket that will checkout version 0.9 from the repository and deliver it as a tarball. The resulting index file can be hosted somewhere globally and referred to with pip install --extra-index-url <url>.

Package Index API

The Package Index "API" is basically a set of nested directories of the form packagename/version with index.html files that will describe how to download the package with links like those shown above.

This works, but is a PITA since it seems to require:

  1. The whole nested directory structure (so you can't just manage a single file).
  2. Requires that these index files be server from a proper web server that will dish them up when given a trailing slash https://.../packagename/versions/. I still don't know why they can't just also look for the index files. I have asked on the mailing list.

Visualizing Dependencies


Try the pipdeptree package. Unfortunately, this lists all dependencies in a conda environment.

pip install pipdeptree


Here is a strategy based on Eric Dill's GIST. We use it to find all of the roots in an environment (packages upon which nothing depends).

In [1]:
pipdeptree | perl -nle'print if m{^\w+}'
Warning!!! Possibly conflicting dependencies found:
* vpython==7.6.1
 - jupyter [required: Any, installed: ?]
* QDarkStyle==2.8.1
 - helpdev [required: >=0.6.10, installed: ?]
* mamba==0.4.4
 - pybind11 [required: >=2.2, installed: ?]
* itkwidgets==0.32.0
 - itk-meshtopolydata [required: >=0.6.2, installed: ?]
* applaunchservices==0.2.1
 - pyobjc [required: Any, installed: ?]
In [9]:
from IPython.display import display, clear_output
import json, glob, sys
import os
from os.path import join, basename
# install this with "conda install -c conda-forge python-graphviz pygraphviz"
import graphviz as gv
import pygraphviz as pgv
import networkx as nx

# path to your conda environment
env_dir = os.path.dirname(sys.prefix)
env_name = os.path.basename(sys.prefix)
env_name = "cugpe"
path = os.path.join(env_dir, env_name)
path = os.path.dirname(env_dir)

path = "/data/apps/conda/envs/cugpe"
#path = "/data/apps/conda/envs/work"
#path = "/data/apps/conda/envs/blog"
#path = "/data/apps/conda/envs/hv_ok"
#path = "/data/apps/conda/envs/_gpe3"

dg = gv.Digraph(name=os.path.basename(path))
pdg = pgv.AGraph(strict=False, directed=True)
ng = nx.DiGraph()

for json_file in glob.glob(join(path, 'conda-meta', '*.json')):
    print('reading', json_file)
    j = json.load(open(json_file))
    name = j['name']
    label = "\n".join([j['name'], j['version']])
    attrs = dict()
    dg.node(name, label=label)
    for dep in j.get('depends', []):
        _dep = dep.split(' ', 1)
        dep_name = _dep.pop(0)
        if _dep:
            attrs = dict(label=_dep[0])
        dg.edge(name, dep_name, **attrs)
        pdg.add_edge(name, dep_name, **attrs)
        ng.add_edge(name, dep_name, **attrs)
In [11]:
roots = sorted([(1+len(nx.descendants(ng, _n)), _n)
                for _n in ng.nodes() if ng.in_degree(_n) == 0])
[(17, ''),
 (20, 'argcomplete'),
 (30, 'conda-verify'),
 (30, 'mmf_setup'),
 (36, 'conda-tree'),
 (37, 'conda-devenv'),
 (41, 'twine'),
 (43, 'anaconda-client'),
 (47, 'mamba'),
 (56, 'conda-build')]
In [6]:
roots = sorted([(1+len(nx.descendants(ng, _n)), _n)
                for _n in ng.nodes() if ng.in_degree(_n) == 0])
[(17, ''),
 (18, 'backports.functools_lru_cache'),
 (19, 'backports.tempfile'),
 (20, 'argcomplete'),
 (24, 'conda-verify'),
 (30, 'mmf_setup'),
 (36, 'conda-tree'),
 (37, 'conda-devenv'),
 (43, 'anaconda-client'),
 (45, 'twine'),
 (50, 'mamba'),
 (52, 'python-graphviz'),
 (56, 'conda-build')]
In [10]:


Conda Performance

Conda can be very slow to resolve dependencies. An experimental project mamba attempts to speed this up.

Redundancy between Conda and Pip

The present workflow duplicates information in for pip and PyPI, and meta.yaml for Conda. This is being discussed as part of the following Conda issue:

Disk Usage

Multiple Conda environments can use a lot of disk space. Some of this can be recovered by cleaning unused packages:

conda clean --all

however, packages that are currently being used can take a lot of space. One way to check is to run ncdu in the pkg directory:

$ ncdu /data/apps/conda/pkg
--- /data/apps/conda/pkgs --------------------------------
  581.6 MiB [##########] /itk-5.1.1-py38h32f6830_3                                                                          
  540.2 MiB [######### ] /mkl-2020.2-260
  314.0 MiB [#####     ] /qt-5.12.5-h514805e_3
  313.5 MiB [#####     ] /qt-5.12.5-h9272185_4
  164.5 MiB [##        ] /vtk-8.2.0-py38h3f69d5f_218
  118.6 MiB [##        ] /pandoc-2.10.1-haf1e3a3_0
  115.2 MiB [#         ] /pandoc-2.10-0
   88.6 MiB [#         ] /pandoc-2.11-h0dc7051_0
   84.3 MiB [#         ] /pandoc-
   84.1 MiB [#         ] /pandoc-2.11-h22f3db7_0

Here we see that there are two versions of qt being used, and multiple versions of pandoc. (This was after a bit of cleaning - previously I had several versions of itk and mkl as well.) To find out which environments are using which package, we can run the following:

function get_ver() {
  envs=$(conda env list --json | jq -r '.envs[]')
  for p in $envs; do
    echo "========================>" $p
    conda list -p $p | grep "\b$1\b"
get_ver qt
========================> /data/apps/conda
========================> /data/apps/conda/envs/alcc
========================> /data/apps/conda/envs/app_gitannex
========================> /data/apps/conda/envs/blog
========================> /data/apps/conda/envs/hg
========================> /data/apps/conda/envs/jupyter
qt                        5.12.5               h9272185_4    conda-forge
========================> /data/apps/conda/envs/leo
========================> /data/apps/conda/envs/super_hydro
========================> /data/apps/conda/envs/work
qt                        5.12.5               h514805e_3    conda-forge

I can then upgrade the work and jupyter environment to make sure it uses the same package, thereby saving space:

mamba update -n work qt
mamba update -n jupyter qt


  • This might not always work if there are conflicts.
  • This can break libraries with numpy and scipy so test.
  • To pin a specific version, you might need to install rather than update:

    mamba install -n work qt=5.12.5=h9272185_4

After running conda clean --all again we have:

$ conda clean --all
$ ncdu /data/apps/conda/pkgs
  581.6 MiB [##########] /itk-5.1.1-py38h32f6830_3
  540.2 MiB [######### ] /mkl-2020.2-260
  319.6 MiB [#####     ] /qt-5.12.9-h717870c_0
  164.5 MiB [##        ] /vtk-8.2.0-py38h3f69d5f_218

Here is my typical usage on Mac OS X after such cleaning:

14.6 GiB /data/apps/conda/
   12.5 GiB [##########] /envs
        10.0 GiB [##########] /work
         2.3 GiB [##        ] /jupyter
         1.2 GiB [#         ] /alcc
         1.1 GiB [#         ] /super_hydro
       590.2 MiB [          ] /leo
       396.9 MiB [          ] /blog
       125.9 MiB [          ] /hg
       115.9 MiB [          ] /app_gitannex
    4.4 GiB [###       ] /pkgs
  548.1 MiB [          ] /conda-bld
  410.7 MiB [          ] /lib
   54.3 MiB [          ] /share
   24.8 MiB [          ] /include
   19.7 MiB [          ] /bin
    7.3 MiB [          ] /conda-meta
    4.3 MiB [          ] /

Building a Conda Package

For a simple example of a Conda package, see the following:

This is a simple python project I forked consisting of a single python file. To turn it into a Conda package, I added the following meta.yaml YAML file, built, and uploaded it as described in Anaconda Cloud: Building Packages:

  name: conda-tree
  version: "0.0.1"

  git_rev: master

    - python

    - networkx
    - conda

  noarch: python
  script:    # See
    - mkdir -p "$PREFIX/bin"
    - cp "$PREFIX/bin/"

  license: MIT
  license_file: LICENSE

This follows a discussion about using conda-build to build a package without (since I wanted to minimally complicate the original package). Once this is built and committed, I can build it an uploaded to [my Conda channel'( as follows:

conda install conda-build anaconda-client  # If needed: I have these in (base)
anaconda login                             # If needed: I stay logged in
conda build .

# The name needed below can be found with
conda build . --output

anaconda upload /data/apps/anaconda/conda-bld/noarch/conda-tree-0.0.1-0.tar.bz2

One gotcha: the conda build . command above will attempt to download the source from the appropriate tag/branch on Github (since that is what we specified as a source). Thus, you must make sure to tag and push those tags:

git tag -a v0.0.1
git push v0.0.1

When a Dependency is not Available to Conda

When you are making conda packages, all elements need to be conda-installable. Here is how to push a pip-installable package to your conda channel:

mkdir tmp; cd tmp
conda activate base
conda skeleton pypi husl
# Edit husl/meta.yaml if needed
conda build husl/meta.yaml
anaconda upload --all /data/apps/conda/conda-bld/osx-64/husl-4.0.3-py37_0.tar.bz2
conda skeleton pypi python-hglib
# Edit python-hglib/meta.yaml if needed
conda build python-hglib/meta.yaml
anaconda upload --all /data/apps/conda/conda-bld/osx-64/python-hglib-2.6.1-py37_0.tar.bz2



By including a section like the following, you can run tests when you issue the conda build command. Some important notes though:

  • Conda now moves the working directory, so you need to specifically include the files you will test with the source_files: section.
  • If you build from your source directory, be sure to clean out __pycache__ directories (see Gotchas below).
    - persist
    - persist
    - coverage
    - h5py
    - pytest >=2.8.1
    - pytest-cov >=2.2.0
    - pytest-flake8
    - scipy
    - py.test

This will do several things:

  1. It will test that import perists works.
  2. It will create a test environment with the specified requirements.
  3. It will run py.test.

If you do this locally (i.e. src: .) then be sure to remove all __pycache__, *.pyc, and *.pyo files:

find . -name "__pycache__" -exec rm -rf {} \;
find . -name "*.pyc" -delete
find . -name "*.pyo" -delete


  • Make sure you have the appropriate channels in your ~/.condarc file. (There is currently no way to specify channels for conda build.)

      - defaults
      - conda-forge
      - mforbes
      - file://data/apps/conda/conda-bld
  • The last entry is because sometimes things are not immediately available on my mforbes channel, even though I upload something to This is where they are built by default. This local channel can be made available by running:

    conda index /data/apps/conda/conda-bld
  • Don't run conda build from a conda environment. Make sure you are in the base environment.

  • If you do this locally (i.e. src: .) then be sure to remove all __pycache__, *.pyc, and *.pyo files:

    find . -name "__pycache__" -exec rm -rf {} \;
    find . -name "*.pyc" -delete
    find . -name "*.pyo" -delete

Current Working Environments

I keep track of my working environments through a set of environment.*.yml files which I host here:

Complete Install (21 July 2018)

Here is an example of a complete install on Mac OS X.

In [ ]:
In [ ]:
%%file environment.base.yml
conda create -n work python=2
conda install -n work pandoc anaconda argcomplete  beautiful-soup  colorcet  dill  distribute mercurial pep8\
                      pyaudio  pycrypto  pympler  pytest-runner sphinx_rtd_theme  ujson cvxopt futures-compat\
                      lancet mkl-service  pycurl bottleneck uncertainties mklfft mock scikit-learn twine weave\
                      snakeviz sympy pillow xarray pytest-cov pylint plotly vispy urllib3 numpydoc  pytest-flake8\
                      line_profiler pygraphviz seaborn ipyparallel nbsphinx  nbtutor paramnb nbdime jupyter\


Old Notes Etc.


  • Install miniconda with perhaps a few tools like mercurial, but use conda create -n <env> to create working environments to help maintain isolation for everything else.
  • Create the environments from an environment.yml file for reproducibility.

    • Make sure that pip is configured to not install packages in the user directory so that they instead go to the conda environments:

      pip config set install.user false

This setup requires activating the appropriate conda environment before working. There are three options for making commands available to all environments:

  • Added the base miniconda bin/ at the end of your path in a special way so that special way so that you can fall back to these base commands. I do this for mercurial which I install in my base environment.
  • Explicitly use pip install --user which will put files in ~/.locals/bin and ~/.locals/lib so that any current python environment can access them. I do this with Nikola for blogging.
  • Create symlinks to conda-installed executables in ~/.locals/bin or aliases to the appropriate environment. I use this, for example, with Jupyter.
In [ ]: