In this post I describe my strategy for managing source code projects and dependencies with python, Conda, [Pip][], etc. I also show how to to maintain your own "PyPI" for source code packages.
Overview¶
Current Strategy¶
:::{margin}
Just make sure you use dependencies:
rather than packages:
in
anaconda-project.yaml
. This will allow conda env
to also use the file.
:::
Our current strategy is to manage conda environments on a per-project basis using
Anaconda Project. Carefully structured, the associated anaconda-project.yaml
file
can act as a replacement for environment.yml
files. This can take more disk space
than using monolithic work
environments like we used to do, but we mitigate this with
some of the following principles:
-
Don't include build tools such as [Poetry][], [Black][], [Mercurial][] etc. in the environment as this can cause dependency issues and bloat. Instead, install these with e.g. [PipX][]. I do the following:
pipx install poetry pipx install git+https://github.com/cookiecutter/cookiecutter.git@2.0.2#cookiecutter pipx install black pipx install nox pipx install poetry2conda pipx install conda-lock pipx install condax pipx install rst-to-myst pipx install twine pipx install mercurial pipx install mmf_setup pipx inject nox nox-poetry poetry pipx inject mercurial hg-git hg-evolve
-
Don't included the conda-forge channel by default. Specify it only when needed like
conda-forge::myst-nb
. This generally keeps the search/build times small, but can have issues with complicated dependencies that require pinning other requirements as coming from conda-forge. It is fine to fallback and include the conda-forge channel, but keep in mind that creating the environment may take much longer. (Mamba can help in these cases.) -
Consider using [Conda][] proper to manage only those packages that are complicated, difficult to build, or have binary dependencies (things like
numpy
,scipy
,matplotlib
,numba
etc.), and then use Pip or [Poetry][] to manage the rest. This works out of the box with Pip, but needs a little bit of help with [Poetry][]. We use the additional functionality of adding commands inanaconda-project.yaml
to do any additional provisioning (such as installing IPython kernels for use in Jupyter). -
To start a new project, take a look at using our cookiecutters. These provide templates for new projects with a working skeleton
anaconda-project.yaml
file with support for documentation hosted on Read the Docs (e.g. Physics 555: Quantum Technologies and Computation), a Makefile for initializing environments on CoCalc, automatic testing, CI, and many other features.
If you need to install new packages, here are some options:
-
If the packages you need are pip-installable, then you can install them locally in
PYTHONUSERBASE
. This defaults to~/.local/
:conda activate work python -m pip install --user ...
For example, with Python 3.8 on linux, this will install the packages in
~/.local/lib/python3.8/
with binaries in~/.local/bin/
etc. This is convenient, but does not completely isolate your environment – any version ofpython3.8
will have access to these. (A potential issue is that the installed packages might depend on libraries in the underlying conda environment, which may then be unavailable in other environments.)A safer approach is to use a virtual environment (venv) with access to the system site packages – i.e. those in the conda environment:
cd <your project> conda activate work python3 -m venv --system-site-packages .venv
This will create a local virtual environment in
.venv
in which any new packages will be installed, but which will still have access to the packages in thework
conda environment.Limitations
- It can be a little confusing having both conda and venv's active. You should be careful to activate the venv only after any conda environment, or without any conda environments active. If you activate a conda environment after activating the venv, things can get quite messed up.
- I don't have a good solution yet for making a completely reproducible environment file:
conda list
will not find packages installed in the venv. You can list these withpip list
orpip freeze
, but these will not include those in the conda environment. This will be addressed below when we discusspoetry
.
-
Unfortunately, this will not work with packages that need conda to install. If you have such additional dependencies, then your best option is to create a clone:
conda create -n mywork --clone work conda activate mywork conda install ...
Limitations¶
There are a few issues with our suggested approach of using read-only conda environments:
-
conda run
cannot be used when the environment is read-only: Conda issue #10690. However, many programs can simply be run by symlinking the appropriate executable somewhere like~/.local/bin/
on your path. - Cloning environments can take quite a bit of memory. In principle, hardlinks should be used, but not everything can be linked... using a single environment would take less memory.
(To suggest additional packages, please create a topic and merge request in the forbes-group/configurations
repository as discussed below.)
Pros:
- Common environments are available for quickly getting down to work.
- Environments stay clean, promoting reducibility.
Limitations:
-
conda run
cannot be used: Conda issue #10690. *
There are two main domains of python packaging:
- Users who need to install packages.
- Developers who need to test and distribute packages.
In both cases, one would like to be able to work in reproducible computing environments, possibly working with a variety of different python versions. In any case, do not use or update the version of python installed for use by the system: this is asking for problems.
Instead, use one of the following strategies:
-
Conda: Provides an easy way of installing and activating independent python environments. In addition to python,
conda
is a full binary package manager able to install many tools and libraries beyond the python ecosystem. The main disadvantage ofconda
is that it can be very slow to install a new environment. To some extent,mamba
.Pros: Cons:
- Slow. See the following issues:
- Global settings in i.e.
~/.condarc
affect behavior with no easy way of isolating. For example,channel_priority
in the following issues:- conda/3150: "Channels not respected when channel_priority is false"
- conda/3377: "channel_priority broken when doing update --all"
- conda/9257: "conda envs: Reproducibility issues"
- conda/9642: "Allow Channel Overrides with Strict Channel Priority"
- conda/9961: Set channel priority in environment.yml
- conda-build/4096: "presence of channel_alias in .condarc reorders channel priority during build"
- constructor/302: and PR 374.
-
pyenv: Pros:
Cons:
- Poor documentation. Some of the introductory articles help, like the Introduction to pyenv.
References¶
As a general strategy, I am now trying to host everything on Anaconda Cloud so that all my packages and environments can be installed with Conda.
Within the Python community, the Python Packaging Authority (PyPA) provides the recommendations of the current working group, and we try to follow their guidelines as laid out in the Python Packaging User Guide.
My general strategy is to maintain Conda environments for each project and each main application, and to rely on these to ensure my projects work and can be easily distributed. In addition, I maintain a common work
environment that has everything and the kitchen sink which I can "just use" to get work done. This is hosted on Anaconda Cloud so you can install it and use it with:
conda install anaconda-client
conda env create mforbes/work # Don't run in a directory with an environment.yml file
conda activate work
Note: do not run this in a folder with an environment.yml
file. See Issue #549.
This notebook describes some details about this strategy which consists of the following concepts:
-
Jupyter: This section discusses the
jupyter
environment where I install Jupyter and related tools.- I use a script to launch the Jupyter notebook server and/or Jupyter Lab so that this environment is first activated.
- I use the
nb_conda
extension so that one can access the conda environments from Jupyter. (This requires ensuring that theipykernel
package is installed in each environment so that they can be found.) - I use the Jupytext extension so that you can store notebooks as python files for better version control. This requires a small manual change in your notebooks to enable.
-
Conda Environments: I try to package everything into Conda environments. There are several use-cases here:
- Global environments such as
work2
andwork3
which have everything including the kitchen sink for python 2 and python 3 respectively. When I just need to get something done, I activate these and work. Don't rely on these for testing though: instead use minimal and dedicated environments for your code. -
Dedicated and minimal project environments should be used for each project that contain the minimal number of packages required. Your tests should be run in these environments (plus any testing tools) so that you can be sure no undocumented dependencies creep in when you add new features. I usually keep two copies:
-
environment.yml
: This contains the packages I need (leaves), usually without version specifications unless there is a known conflict. It can be kept clean as descibed in the section Visualizing Dependencies. To create or update the environment:conda env update -f environment.yml
-
environment.frozen.yml
: This contains the explicit versions I am testing against to ensure reproducible computations. It can be formed by:conda activate <environment name> conda env export > environment.frozen.yml
-
- Global environments such as
-
Anaconda Cloud: I try to use my
mforbes
channel on Anaconda Cloud to host useful environments and packages. Ideally, all packages should be installable with only Conda. Some information about how to build and maintain Conda packages and environments is discussed in section Building a Conda Package.
TL;DR¶
Conda¶
- Use conda to manage environments. (Conda can be slow: for faster installs you can try mamba, but it is still experimental.)
- For each project, create an environment which will contain all the packages like NumPy, SciPy etc. needed for that project. (Also specify python 2 or python 3). Store these requirements in a project-specific
environment.yml
file. -
Create an environment for that project that includes the
ipykernel
package:conda env create --file environment.yml
(This assumes that
environment.yml
specifies the name of the environment, otherwise you need to pass conda the-n <env>
flag). - Install Jupyter (see instructions below) in its own environment and include the
nb_conda
package. By including theipykernel
package in each environment and thenb_conda
package in thejupyter
environment, Jupyter will be able to find your environments and use them as kernels. -
Launch Jupyter with a custom kernel for your project using the following function which will activate the
jupyter
environment first:# ~/.bashrc or similar function j { if [ -f './jupyter_notebook_config.py' ]; then CONFIG_FLAG="--config=./jupyter_notebook_config.py"; else if [ -f "$(hg root)/jupyter_notebook_config.py" ]; then CONFIG_FLAG="--config=$(hg root)/jupyter_notebook_config.py"; else CONFIG_FLAG=""; fi; fi; echo "conda activate jupyter"; conda activate jupyter; echo "jupyter notebook ${CONFIG_FLAG} $*"; jupyter notebook "${CONFIG_FLAG}" "$*"; conda deactivate }
-
For development and testing, create a set of minimal environments with the versions you want to test against:
envdir="$(conda info | grep 'envs directories' | awk 'NF>1{print $NF}')" for p in 2.7 3.5 3.6 3.7 3.8 3.9; do env=py${p} sudo chown mforbes "${envdir}/${env}" conda env remove -y -n ${env} 2> /dev/null rm -rf "${envdir}/${env}" conda create -f -y -c defaults --override-channels \ --no-default-packages -n "${env}" python=${p} sudo chown admin "${envdir}/${env}" ln -s "${envdir}/${env}/bin/python${p}" ~/.local/bin/ done conda clean --all -y
Pip¶
The following recommendations are based on using pip to manage dependencies. We are in the process of migrating to a more conda-centric version.
- Provide a
setup.py
file for your source repositories that contains all of the dependency information in theinstall_requires
argument tosetup()
. - Structure your repository with tags and/or named branches so that you can
hg update 0.9
etc. to update to the various versions. - Host an
index.html
file somewhere that points to the Download links of the various projects. - Use
pip install --extra-index-url <index.html>
to allow pip to resolve dependencies against your source code.
Poetry (Trial)¶
I am giving a poetry
a try. This is a package for replacing setup.py
providing virtual environments for development and testing.
Installation¶
conda deactivate # Optional: deactivate conda environments
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
mkdir -p ~/.local/share/bash-completion/completions
poetry completions bash > ~/.local/share/bash-completion/completions/poetry.bash
This requires a modern version of bash
. See issues #3418. I use
port install bash-completion
sudo echo /opt/local/bin/bash >> /etc/shells
chsh -s /opt/local/bin/bash
echo >> ~/.bashrc <<EOF
if [ -f /opt/local/etc/profile.d/bash_completion.sh ]; then
. /opt/local/etc/profile.d/bash_completion.sh
fi
EOF
To uninstall:
POETRY_UNINSTALL=1 bash -c 'curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python'
Usage¶
Follow the instructions to setup your project using a pyproject.toml
file instead of setup.py
. If you still want to be able to use source-installs (pip install -e
), then make a stub setup.py
file:
# setup.py
import setuptools
setuptools.setup()
When you want to develop the project, you should first set the python environment. If you follow my advice above, you will have various python versions available on your PATH
such as python3.6
, python3.7
etc. Get poetry to create a virtual environment with your chose as follows:
$ poetry env use python2.7
Creating virtualenv wdata-6Y3wHwFr-py2.7 in ...Caches/pypoetry/virtualenvs
Using virtualenv: .../Caches/pypoetry/virtualenvs/wdata-6Y3wHwFr-py2.7
$ poetry env use python3.9
Creating virtualenv wdata-6Y3wHwFr-py3.9 in .../Caches/pypoetry/virtualenvs
Using virtualenv: .../Caches/pypoetry/virtualenvs/wdata-6Y3wHwFr-py3.9
...
Now you can do things like install the project in this environment, or run tests etc. For example, testing the packaged against Python 3.9 would be accomplished by:
poetry env use python3.9
poetry install
poetry run pytest
Nox¶
The following works with nox
:
# noxfile.py
import nox
from nox.sessions import Session
@nox.session(python=["3.6", "3.7", "3.8", "3.9"])
def test(session: Session) -> None:
"""Run the test suite."""
session.run("poetry", "env", "use",
session.virtualenv.interpreter, external=True)
session.run("poetry", "install", external=True)
session.run("poetry", "run", "pytest", external=True)
There may be better ways of getting this to work.
Cheatsheet¶
poetry config --list
-
poetry shell
: Start a shell poetry config repositories.testpypi https://test.pypi.org/legacy/
-
rm -rf "$(poetry config virtualenvs.path)"
: Remove virtual envs. -
rm -rf "$(poetry config cache-dir)"
: Remove entire cache of virtual envs etc.
Issues¶
-
No support for
scripts
yet.- https://github.com/python-poetry/poetry/issues/241
- https://github.com/python-poetry/poetry-core/pull/40
Workaround is to use entry-points:
- Assumes that an active conda environment is a valid virtual environment... not very good for isolation. See the discussion in Issue #522.
Conda¶
Conda and Anaconda provide quite complete python distributions for scientific computing. Unless I am specifically developing packages for general us, I allow myself to rely on the default Anaconda distribution. This includes the Conda package manager, which provides an alternative to virtualenv for managing environments.
There is a nice discussion of the state of conda here:
Conda has a few important benefits:
- Many binary packages are supported, including some that are not python (and hence cannot be managed by pip).
- Conda including builds of NumPy, SciPy, and Matplotlib that can be difficult to install from source.
- Conda keeps track of your history (try
conda list --revisions
) which can help if you need to reproduce some history. - For educational use (or paid commercial use), one can install the Accelerate package.
- Conda is aware of packages installed with
pip
so these can still be used. - Conda provides environments, replacing the need for
virtualenv
.
Cheatsheet¶
Working with packages:
-
conda update --all
: Update all packages in an environment. -
conda clean --all
: Remove all downloaded packages and unused files. Can free significant amounts of disk space.
Working with environments:
-
conda activate <env>
/conda deactivate
: Activate or deactivate an environment. -
conda env list
: Show installed environments (and their location). -
conda env update [ -n name ] ( -f environment.yml | channel/spec ) [ --prune ]
: Update packages in an environment from a file or anaconda channel, optionally removing unnecessary files (but see this issue.)
Clone an environment:
- Using
conda-pack
which must be installed (conda install -c conda-forge conda-pack
):env=work # Specify the name here conda pack -n "${env}" -o "${env}.tgz" tar -zxvf "$env.tgz" -C new_env_folder conda-unpack -p new_env_folder mamba install --force-reinstall olefile defusedxml python.app
Package Availablity¶
One issue when dealing with conda is that not all packages are available. There several ways to deal with this:
-
Search for the package on another channel. The following channels seem to be reliable:
- [conda-forge][]: This is a community driven project that provides an infrastructure for building and maintaining packages cleanly on Windows, Linux, and OS X. It is described in the post - Community powered conda packaging: conda-forge.
- pipy To add a new channel:
conda config --append channels conda-forge
Then you can look with
$ conda config --show add_anaconda_token: True add_pip_as_python_dependency: True allow_softlinks: True always_copy: False always_yes: False auto_update_conda: True binstar_upload: None changeps1: True channel_alias: https://conda.anaconda.org/ channel_priority: True channels: - defaults - conda-forge client_cert: client_cert_key: create_default_packages: [] debug: False default_channels: - https://repo.continuum.io/pkgs/free - https://repo.continuum.io/pkgs/pro disallow: [] json: False offline: False proxy_servers: {} quiet: False shortcuts: True show_channel_urls: True ssl_verify: True track_features: [] update_dependencies: True use_pip: True verbosity: 0
-
Build your own conda package and upload it to Anaconda Cloud. It seems that contributing to [conda-forge][] is the preferred way to do this but I have not yet tried.
- Use
pip
. The disadvantage here is that packages are then managed by bothconda
andpip
which can lead to some confusion, but this does work reasonably well and is how I used to do things for many years. The advantage is that you can specify your dependences simply with the standardsetup.py
file. I am not sure how to do this withconda
yet.
+++
Getting Started¶
+++
My strategy is to install a minimal conda installation with python 2.0 and then add various environments as needed. Each project should have its own conda environment so that one can reproducibly manage the dependencies of that project. This ties together as follows:
-
The conda
base
environment and some custom environments (jupyter
might be a good candidate) can be maintained at a system level by an admin. On my computers, these reside in:/data/apps/conda/ # System (mini)conda directory |-- envs/ # System-defined environments | |-- jupyter/ # Jupyter notebooks etc. with nb_conda extension | `-- ... |-- bin/ # Executables for `base` environment `-- ...
-
The
jupyter
environment contains thenb_conda
extension which allows one to choose kernels based on known conda environments. With this, you can run jupyter from thejupyter
environment and use a specialize kernel for whatever computation you need. (For example,jupyter
should be run with Python 3, but you may have kernels that are still Python 2. This approach means you do not need to include jupyter in you project environment.) -
On a system where users cannot write to
/data/apps/conda
, environments created by the user will be automatically placed in:~/.conda/envs/
Thus, users can immediately create their own environments as needed.
-
If a user needs to create an environment in
~/.conda/envs/
that shadows an environment in/data/apps/conda/envs
, then it seems this can be done by first making the appropriate directory, then using conda to install whatever is needed:$ conda create -n jupyter CondaValueError: prefix already exists: /data/apps/conda/envs/jupyter $ mkdir -p ~/.conda/envs/jupyter $ conda install -n jupyter ...
This is probably a bit of a hack, but seems to work.
Miniconda¶
I first install Miniconda with Python 2.7.
- I choose to use python 2.7 here because it allows me to install Mercurial which will likely not support Python 3 for some time.
- I add the miniconda
bin
directory to the end of my path so that thehg
command will fall through if I have activated a python 3 environment. - Other than mercurial and related tools, I keep the default environment clean.
Update Conda¶
After installing miniconda, I do an update:
%%bash
. deactivate 2> /dev/null # Deactivate any environments
#conda install -y --revision=0 # Revert to original revision. This is failing! Removing conda
conda update -y conda # Update conda
conda update -y --all # Update all other packages
conda list
anaconda-client 1.6.0 py27_0 defaults
argcomplete 1.0.0 py27_1 defaults
beautifulsoup4 4.5.3 py27_0 defaults
cffi 1.9.1 py27_0 defaults
chardet 2.3.0 py27_0 defaults
clyent 1.2.2 py27_0 defaults
conda 4.3.6 py27_0 defaults
conda-build 2.1.1 py27_0 defaults
conda-env 2.6.0 0 defaults
conda-verify 2.0.0 py27_0 defaults
contextlib2 0.5.4 py27_0 defaults
cryptography 1.7.1 py27_0 defaults
dulwich 0.16.3 <pip>
enum34 1.1.6 py27_0 defaults
filelock 2.0.7 py27_0 defaults
futures 3.0.5 py27_0 defaults
hg-git 0.8.5 <pip>
idna 2.2 py27_0 defaults
ipaddress 1.0.18 py27_0 defaults
jinja2 2.9.4 py27_0 defaults
markupsafe 0.23 py27_2 defaults
mercurial 3.9.2 py27_0 defaults
openssl 1.0.2j 0 defaults
pip 9.0.1 py27_1 defaults
pkginfo 1.4.1 py27_0 defaults
pyasn1 0.1.9 py27_0 defaults
pycosat 0.6.1 py27_1 defaults
pycparser 2.17 py27_0 defaults
pycrypto 2.6.1 py27_4 defaults
pyopenssl 16.2.0 py27_0 defaults
python 2.7.13 0 defaults
python-dateutil 2.6.0 py27_0 defaults
python-hglib 2.2 <pip>
pytz 2016.10 py27_0 defaults
pyyaml 3.12 py27_0 defaults
readline 6.2 2 defaults
requests 2.12.4 py27_0 defaults
ruamel_yaml 0.11.14 py27_1 defaults
setuptools 27.2.0 py27_0 defaults
six 1.10.0 py27_0 defaults
sqlite 3.13.0 0 defaults
tk 8.5.18 0 defaults
wheel 0.29.0 py27_0 defaults
yaml 0.1.6 0 defaults
zlib 1.2.8 3 defaults
Historical note: When I originally wrote this article, I obtained the following outputs at various times:¶
# packages in environment at /data/apps/anaconda:
#
cffi 1.9.1 py27_0 defaults
conda 4.3.7 py27_0 defaults
conda-env 2.6.0 0 defaults
cryptography 1.7.1 py27_0 defaults
enum34 1.1.6 py27_0 defaults
idna 2.2 py27_0 defaults
ipaddress 1.0.18 py27_0 defaults
openssl 1.0.2j 0 defaults
pip 9.0.1 py27_1 defaults
pyasn1 0.1.9 py27_0 defaults
pycosat 0.6.1 py27_1 defaults
pycparser 2.17 py27_0 defaults
pycrypto 2.6.1 py27_4 defaults
pyopenssl 16.2.0 py27_0 defaults
python 2.7.13 0 defaults
readline 6.2 2 defaults
requests 2.12.4 py27_0 defaults
ruamel_yaml 0.11.14 py27_1 defaults
setuptools 27.2.0 py27_0 defaults
six 1.10.0 py27_0 defaults
sqlite 3.13.0 0 defaults
tk 8.5.18 0 defaults
wheel 0.29.0 py27_0 defaults
yaml 0.1.6 0 defaults
zlib 1.2.8 3 defaults
# packages in environment at /data/apps/anaconda:
#
conda 4.2.7 py27_0 defaults
conda-env 2.6.0 0 defaults
enum34 1.1.6 py27_0 defaults
openssl 1.0.2i 0 defaults
pip 8.1.2 py27_0 defaults
pycosat 0.6.1 py27_1 defaults
python 2.7.12 1 defaults
pyyaml 3.12 py27_0 defaults
readline 6.2 2 defaults
requests 2.11.1 py27_0 defaults
ruamel_yaml 0.11.14 py27_0 defaults
setuptools 27.2.0 py27_0 defaults
sqlite 3.13.0 0 defaults
tk 8.5.18 0 defaults
wheel 0.29.0 py27_0 defaults
yaml 0.1.6 0 defaults
zlib 1.2.8 3 defaults
# packages in environment at /data/apps/anaconda:
#
conda 3.12.0 py27_0
conda-env 2.1.4 py27_0
openssl 1.0.1k 1
pip 6.1.1 py27_0
pycosat 0.6.1 py27_0
python 2.7.9 1
pyyaml 3.11 py27_0
readline 6.2 2
requests 2.7.0 py27_0
setuptools 15.2 py27_0
sqlite 3.8.4.1 1
tk 8.5.18 0
yaml 0.1.4 1
zlib 1.2.8 0
One interesting feature shown here is that conda keeps track of your revisions. You can update the revisions with conda install --revision=0
(the original miniconda installation for example) and you can list them with:
%%bash
. deactivate 2> /dev/null
echo $PATH
conda list --revisions
Command Completion¶
Before doing any more work, add command completion so that you can explore options:
%%bash
. deactivate 2> /dev/null
conda install -y argcomplete
eval "$(register-python-argcomplete conda)"
Configure Conda¶
Conda allows you to configure it with the conda config
command. Here I add the [conda-forge][] channel.
%%bash
. deactivate 2> /dev/null
conda config --append channels conda-forge
%%bash
. deactivate 2> /dev/null
conda config --show
Environments¶
To ensure reproducible computing, I highly recommend working from a clean environment for your projects so that you can enter a clean environment with
conda activate _conda_env
To do this, maintain an environment.yml
file such as this:
# environment.yml
name: _conda_env
channels:
- defaults
- conda-forge
- ioam
dependencies:
- ipykernel # Use with nb_conda_kernels so jupyter sees this kernel
- numpy
- scipy>=0.17.1
- matplotlib>=1.5
- uncertainties
- ipywidgets
- holoviews
- pip:
- mmf_setup>=0.1.11
Then create a new environment as follows:
conda env create -p _conda_env -f environment.yml
Periodically you should check if anything is outdated:
conda search -p _conda_env --outdated
and possibly update everything
conda update -p _conda_env --all
If you then make a _conda_env
environment in the project and choose this as the kernel for your notebooks, you can then run in an isolated environment. Don't forget to exclude _cond_env
from selective sync with Dropbox etc. as it can get big! For this reason, it can be better to install this globally.
Anaconda¶
For my work
environment, I like to have everything in the anaconda
meta-package, but this precludes me pinning python to a preferred version. For example - the following gives an error:
# environment.tst.yml
name: tst
channels:
- defaults
- conda-forge
dependencies:
- python == 3.8.6 # Try >= 3.8.5
- anaconda # Gets most things: we usually want to depend on this.
Conda fails to built this, while Mamba builds a version that breaks on OS X:
mamba env create -f environment.tst.yml
conda activate -n tst && python -c "import scipy"
Being less restrictive works.
Frozen Environments¶
I manually create environment files like the one above which specify the leave nodes I need. To prune these, see the section Visualizing Dependencies:Conda
I keep a few clean environments for use when testing. I make these as another user (admin
) so that I can't accidentally modify the environments.
%%bash
. deactivate 2> /dev/null
conda update conda
envdir="$(conda info | grep 'envs directories' | awk 'NF>1{print $NF}')"
for p in 2.7 3.5 3.6 3.7 3.8 3.9; do
env=py${p}
echo conda env remove -y -n ${env} 2> /dev/null
echo conda create -y -c defaults --override-channels --no-default-packages -n "${env}" python=${p}
echo sudo chown admin "${envdir}/${env}"
echo ln -s "${envdir}/${env}/bin/python${p}" ~/.local/bin/
done
conda clean --all -y
# This takes about 600MB
for p in 2.7 3.5 3.6 3.7 3.8 3.9; do
env=py${p}a
conda env remove -y -n ${env} 2> /dev/null
conda create -y -n ${env} python=${p} anaconda
done
# This take more tha 8GB... I gave up.
If I want to test against a clean environment, I make a clone:
%%bash
conda create -m -p _envs/py2.6 --clone py2.6
. activate _envs/py2.6
. deactivate
rm -rf _envs
Copying Environments¶
Conda environments are not relocatable. Even though everything lives in a single folder, moving that is asking for trouble. Here are some strategies:
conda-pack
¶
BROKEN: When I tried this, SciPy failed:
In [1]: import scipy
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/__init__.py in <module>
21 try:
---> 22 from . import multiarray
23 except ImportError as exc:
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/multiarray.py in <module>
11
---> 12 from . import overrides
13 from . import _multiarray_umath
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/overrides.py in <module>
6
----> 7 from numpy.core._multiarray_umath import (
8 add_docstring, implement_array_function, _get_implementing_args)
ImportError: dlopen(/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-darwin.so, 2): Library not loaded: @rpath/libopenblas.dylib
Referenced from: /data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-darwin.so
Reason: image not found
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
<ipython-input-1-4363d2be0702> in <module>
----> 1 import scipy
/data/apps/conda/envs/work_/lib/python3.8/site-packages/scipy/__init__.py in <module>
59 __all__ = ['test']
60
---> 61 from numpy import show_config as show_numpy_config
62 if show_numpy_config is None:
63 raise ImportError(
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/__init__.py in <module>
138 from . import _distributor_init
139
--> 140 from . import core
141 from .core import *
142 from . import compat
/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/__init__.py in <module>
46 """ % (sys.version_info[0], sys.version_info[1], sys.executable,
47 __version__, exc)
---> 48 raise ImportError(msg)
49 finally:
50 for envkey in env_added:
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.8 from "/data/apps/conda/envs/work_/bin/python"
* The NumPy version is: "1.19.2"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: dlopen(/data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-darwin.so, 2): Library not loaded: @rpath/libopenblas.dylib
Referenced from: /data/apps/conda/envs/work_/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-darwin.so
Reason: image not found
Allows you to copy the environment. Install with conda install -c conda-forge conda-pack
.
env=work # Specify the name here
new_penv=new_env_folder # It will go here
conda pack -n "${env}" -o "${env}.tgz"
mkdir "${new_penv}"
tar -zxvf "$env.tgz" -C "${new_penv}"
conda activate "${new_penv}"
conda-unpack
Caveats:
- Packages installed with
pip -e
in source mode cannot be copied. Install these properly or remove withpip uninstall
. -
Fails on Mac OS X with
python.app
and some other packages:Files managed by conda were found to have been deleted/overwritten in the following packages:
- olefile='0.46'
- defusedxml='0.6.0'
- python.app='1.3'
This is usually due to
pip
uninstalling or clobbering conda managed files, resulting in an inconsistent environment. Please check your environment for conda/pip conflicts usingconda list
, and fix the environment by ensuring only one version of each package is installed (conda preferred).In principle, this should be fixed by forcing these to be reinstalled:
mamba install -n "${env}" --force-reinstall olefile defusedxml python.app
but this does not fix
python.app
. For this, uninstall, pack, move, then reinstall:#mamba uninstall --force -n "${env}" python.app #Fails: see https://github.com/mamba-org/mamba/issues/412 conda uninstall --force -n "${env}" python.app ... mamba install -p "${new_penv}" python.app
Spec-file¶
conda list --explicit > spec_file.txt
Jupyter¶
I install Jupyter into it's own environment and then use the following alias to launch it. This allows me to launch it in any environment without having to install it. To use those environments as their own kernel, make sure you include the ipykernel
package in the environment and then the nb_conda
package in the jupyter environment.
conda env create --file environment.jupyter.yml
Sometimes I have difficulty getting extensions working. The following may help:
jupyter serverextension enable --sys-prefix jupyter_nbextensions_configurator nb_conda
jupyter nbextension enable --sys-prefix toc2/main init_cell/main
jupyter serverextension disable --sys-prefix nbpresent
jupyter nbextension disable --sys-prefix nbpresent/js/nbpresent.min
# environment.jupyter.yml
name: jupyter
description: |
Environment for running Jupyter, Jupyter notebooks, Jupyter Lab
etc. This environment includes the following packages:
* `nb_conda`: This enables Jupyter to local conda-installed
environments. To use this, make sure you install the `ipykernel`
package in your conda environments.
* `jupytext`: Allows you to store Jupyter notebooks as text files
(.py) for better version control. To enable this for a notebook,
you can run:
jupytext --set-formats ipynb,py --sync notebook.ipynb
* `rise`: Allows you to use your notebooks for presentations.
channels:
- defaults
- conda-forge
dependencies:
- python=3
- nb_conda
- jupyter_contrib_nbextensions
- jupyter_nbextensions_configurator
- ipyparallel
- nbbrowserpdf
- nbdime
- nbstripout
- jupyterlab
- nbdime
- rise
- jupytext
# VisPy
#- ipywidgets
#- vispy
#- npm
#- pip
#- pip:
# - git+git@github.com:vispy/vispy.git#egg=vispy
# matplotlib
#- ipympl
# jupyter nbextension enable --py widgetsnbextension
# jupyter nbextension enable --py vispy
# ~/.environment_site which is sourced by ~/.bashrc
...
# New way of activating conda: Use your install location
. /data/apps/anaconda/etc/profile.d/conda.sh
conda activate # Optional - activates the base environment.
function j {
if [ -f './jupyter_notebook_config.py' ]; then
CONFIG_FLAG="--config=./jupyter_notebook_config.py";
else
if [ -f "$(hg root)/jupyter_notebook_config.py" ]; then
CONFIG_FLAG="--config=$(hg root)/jupyter_notebook_config.py";
else
CONFIG_FLAG="";
fi;
fi;
echo "conda activate jupyter";
conda activate jupyter;
echo "jupyter notebook ${CONFIG_FLAG} $*";
#unset BASH_ENV module ml;
jupyter notebook "${CONFIG_FLAG}" "$*";
conda deactivate
}
Configuration¶
To see where Jupyter configuration files go, run
%%bash
. activate jupyter
jupyter --paths
I add the following kernel so I can seamlessly match with [CoCalc][].
# ~/Library/Jupyter/kernels/python2-ubuntu/kernel.json
{
"display_name": "Python 2 (Ubuntu, plain)",
"argv": [
"/data/apps/anaconda/envs/work/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"language": "python"
}
Anaconda Cloud¶
Once you have useful environments, you can push them to Anaconda Cloud so that you or others can use them:
anaconda login # If you have not logged in.
anaconda upload environment.work.yml
Once this is done, you can install the environment with one of:
conda env create mforbes/work
mamba env create mforbes/work # Faster, but experimental
Clean¶
The Conda package cache can help speed the creation of new environments, but can take up quite a bit of disk space. One can reclaim this with conda clean
:
!conda clean -p --dry-run
!conda clean -p -y
!conda clean -t --dry-run
Basic Packages and Configuration¶
Here is a list of basic packages that I always use. I install these in two custom environments work2
and work3
. I also make a symlink called work
which points to the appropriate environment in which I am currently working.
First, however, I install the following in the default conda environment:
%%bash
. deactivate 2> /dev/null
#conda install -y --revision=0 # Revert to original revision: Removes conda!
conda update -y conda # Update conda
conda config --append channels conda-forge # Added conda-forge channel
# Now install all additional packages
conda install -y argcomplete mercurial conda-build anaconda-client
conda update -y --all # Update all other packages
conda list
#pip install -U hg-git python-hglib
Now I add the default path to both the start and end of %PATH
with different names in my .bashrc
file:
export PATH="/data/apps/anaconda/bin:$PATH:/data/apps/anaconda/bin/."
The reason for the latter addition is:
- It will be a fallback when activating other environments so I can still use mercurial.
- By giving it a different name it will not be removed when
. deactivate
is called.
%%bash
. deactivate 2> /dev/null
conda create -n work2 --clone py2a
conda install -y accelerate coverage zope.interface -n work2
conda install -y line_profiler -n work2
conda update -y --all -n work2
conda remove -y nose pyflake8 -n work2
. activate work2
pip install -U nose pyflake8 memory_profiler
pip install -U nikola webassets # For blogging
pip install -U pyfftw # Needs the FFTW (from source, install all precisions)
pip install -U pygsl # Needs the GSL (port install gsl)
pip install -U uncertainties # uncertainties package I work with for error analysis
pip install -U rope jedi yapf # Used by elpy mode in emacs.
Activate and Deactivate Scripts¶
When sourcing the activate and deactivate scripts, files "$CONDA_ROOT/etc/conda/activate.d/*.sh"
and "$CONDA_ROOT/etc/conda/deactivate.d/*.sh"
will be sourced. This allows one to perform additional configurations such as setting IPYTHONDIR
. To find CONDA_ROOT
programatically use:
%%bash
CONDA_ROOT="$(conda info -e | grep '*' | cut -d '*' -f2 | sed 's/^ *//g')"
echo CONDA_ROOT=$CONDA_ROOT
For example, the following two scripts to set and reset (once) IPYTHONDIR
:
/data/apps/anaconda/envs/work/etc/conda/activate.d/set_ipythondir.sh
if [ ! -z \${IPYTHONDIR+x} ]; then
export OLD_IPYTHONDIR="\$IPYTHONDIR"
fi
export IPYTHONDIR="/data/apps/anaconda/envs/work/.ipython"
/data/apps/anaconda/envs/work/etc/conda/deactivate.d/unset_ipythondir.sh
if [ -z \${OLD_IPYTHONDIR+x} ]; then
unset IPYTHONDIR
else
export IPYTHONDIR="\$OLD_IPYTHONDIR"
unset OLD_IPYTHONDIR
fi
I make these files with a script like the following which I can run in the appropriate environment to set the .ipython
directory in the current folder as the IPYTHONDIR
for the environment. Note: It does some checks to make sure that an environment is active.
#!/bin/bash
CONDA_ROOT="$(conda info --root)"
CONDA_ENV="$(conda info -e | grep '*' | cut -d '*' -f2 | sed 's/^ *//g')"
if [ "$CONDA_ROOT" == "$CONDA_ENV" ]; then
echo "Not in a conda environment. Activate the environment first. I.e.:"
echo
echo " . activate work"
echo
exit 1
else
echo "Creating activate and deactivate scripts:"
echo
echo " $CONDA_ENV/etc/conda/activate.d/set_ipythondir.sh"
echo " $CONDA_ENV/etc/conda/deactivate.d/unset_ipythondir.sh"
echo
fi
mkdir -p "$CONDA_ENV/etc/conda/activate.d"
mkdir -p "$CONDA_ENV/etc/conda/deactivate.d"
IPYTHONDIR="$(cd .ipython && pwd)"
cat <<EOF > "$CONDA_ENV/etc/conda/activate.d/set_ipythondir.sh"
# This file sets the IPYTHONDIR environmental variable to use the
# settings in $IPYTHONDIR
# It is called when you activate the $CONDA_ENV environment with
#
# . activate $CONDA_ENV
if [ ! -z \${IPYTHONDIR+x} ]; then
export OLD_IPYTHONDIR="\$IPYTHONDIR"
fi
echo Setting IPYTHONDIR="$IPYTHONDIR"
export IPYTHONDIR="$IPYTHONDIR"
EOF
cat <<EOF > "$CONDA_ENV/etc/conda/deactivate.d/unset_ipythondir.sh"
# This file resets the IPYTHONDIR environmental to the previous value.
# It is called when you deactivate the $CONDA_ENV environment with
#
# . deactivate
if [ -z \${OLD_IPYTHONDIR+x} ]; then
echo Unsetting IPYTHONDIR
unset IPYTHONDIR
else
echo Resetting IPYTHONDIR="\$OLD_IPYTHONDIR"
export IPYTHONDIR="\$OLD_IPYTHONDIR"
unset OLD_IPYTHONDIR
fi
EOF
. "$CONDA_ENV/etc/conda/activate.d/set_ipythondir.sh"
Development¶
If you are developing software, the conda-devenv
package might help. It allows you to maintain several dependent environment files for example. This might be useful when you want to have separate environment files for testing python 2 and python 3. Useful features include (from the docs):
- Jinja 2 support: gives more flexibility to the environment definition, for example making it simple to conditionally add dependencies based on platform.
- include other environment.devenv.yml files: this allows you to easily work in several dependent projects at the same time, managing a single conda environment with your dependencies.
- Environment variables: you can define a environment: section with environment variables that should be defined when the environment is activated.
Packages and Dependencies¶
Making sure that you have all of the required packages installed can be a bit of a pain. I have toyed with three different types of strategies outlined below. I currently use method 1. when actively developing, then switch to a version that supports both 2. and 3.
-
Include the source code for projects you need as weak dependencies managed with the myrepos (
mr
) tool:Basically, I include a
.mrconfig
file which instructsmr
how to checkout the required projects:[_ext/pymmf] checkout = hg clone 'https://bitbucket.org/mforbes/pymmf' 'pymmf'
When I run
mr checkout
, this project will be cloned to_ext/pymmf
and I can then symlink the appropriate files to a top-level moduleln -s _ext/pymmf/mmf mmf
. If you usepymmf
in many projects, you can replace_ext/pymmf
with a simlink to an appropriate common location (but keep in mind that each project will be using the same revision then.)A small set of additional command can give
mr freeze
which tracks the specific verions. There are some bugs with this, but you can see an example in my mmfhg project, in particular in themrconfig
file.This approach is useful if you need to activately develop both the top level project and the subproject, but requires manual intervention to update etc. and is not suitable for general usage since collaborators will need to add myrepos to their toolchain.
-
Write a
setup.py
file and use this orpip
to manage the dependencies.This is a better choice for general distribution and collaboration, but still has a few warts. An advantage of this over the next option is that it will then work with PyPI.
-
Write a
meta.yaml
file and useconda
to manage dependencies.This is a good choice for general distribution if you want your package to be installable with
conda
from Anaconda Cloud.
Setuptools and Pip¶
The Python Packaging User Guide is supposed to provide the definitive guide for managing python packages. Its current recommendations for installing packages is to use pip
, so we would like to maintain that strategy. The basic idea here is to include a setup.py
file in your top level directory that specifies your dependencies (and example will be provided below). This is designed for and works well will packages published on PyPI but starts failing when you need to depend on projects that are not published (i.e. those hosted on github or bitbucket.)
Setup(.py) and Requirements(.txt)¶
Some of the issues are discussed in this rant but are centered around the issue of how to specify the location of code not available on PyPI. Both pip
and setuptools
support installation from version controlled sources. The recommended approach for this (see Setup vs. Requirements is to put these dependencies in a requirements.txt
file, but there are problems with this approach:
-
Users must run
pip install -r requirements.txt
explicitly rather than justpip install -e .
orpython setup.py develop
.This is probably not such a big deal as long as the install instructions are clear.
-
If you require a subproject that also has
requirements.txt
, there is no easy way of recursively parsing the subprojects requirements. (The requirements specified in the subprojectssetup.py
will get processed.The only solution I know to this second issue is to make sure your
requirements.txt
file is very complete, specifying all possible requirements recursively. -
Suppose that you use branches or tags in your code to mark versions. Now suppose that one project
A
that depends on another projectB
, and thatB
depends on a specific version ofC==0.9
. You dutifully include this in thesetup.py
files:# A's setup.py install_requires = ["B"]
# B's setup.py install_requires = ["C==0.9"]
However, at this point, neither
pip
norsetuptools
can installA
since neitherB
norC
are installed on PyPI. The recommended solution is to specify all of the dependences inA
'srequirements.txt
file:-e hg+ssh://hg@bitbucket.org/mforbes/B#egg=B -e hg+ssh://hg@bitbucket.org/mforbes/C#egg=C -e .
Installing
A
withpip install -r requirements.txt
will now installB
andC
, but the requirements are now broken. The packageC
for example will be installed at the latest version, even if this breaks theC==0.9
requirement byB
. This appears to be broken even if we provide the explicit versions in therequirements.txt
file:-e hg+ssh://hg@bitbucket.org/mforbes/B#egg=B -e hg+ssh://hg@bitbucket.org/mforbes/C@0.8#egg=C-0.8 -e .
Thus we are left in the awkward position of having to make sure in project
A
that we pin the exact requirements we need in subprojectC
even though we do not (directly) depend on it.
The problem described above is consistent with the following roles of setup.py
and requirements.txt
:
- The
install_requires
argument tosetup()
insetup.py
specifies what is required and version information should be used to specify what does not work (i.e. avoid this version). - The
requirements.txt
specifies exactly one situation that does work.
In case of conflict, the second point here seems to override the first one (C-0.8
above gets installed even though the setup for B
specifies C==0.9
).
Thus, requirements.txt
is completely useless for specifying dependencies in a general way. The only solution here appears to be to carefully construct a requirements.txt
file whenever a project or its dependencies change. The following tools may help with this, but I am not happy with this solution:
pip freeze > requirements.txt
- http://furius.ca/snakefood/
A Better Solution: Host your own Index¶
Can we get the benefits of the usual dependency resolution while using version controlled software? To do this, it seems that you need to host your packages through an index via one of the following approaches:
- Upload your package to PyPI: This works, but requires a lot of discipline and maintenance to make sure your package is complete, tested, working, and properly described whenever you need to freeze it. The whole release process can be a pain, and submitting a bunch or random tools to PyPI is considered bad practice. (For one thing, there is a global namespace, so you might steal a useful name for a bunch of specialized tools where another package might have been able to better use it for the community.)
- Host your own Package Index.
The latter approach seems at first to require running a web-server somewhere which is off-putting, but actually supports two quite reasonable solutions. The first is to package tarballs of your projects into a single directory with canonical names like C-0.8.tar.gz
, and C-0.9.tar.gz
and then point to this with the pip --find-links <dir>
option. (The tarballs can be produced with python setup.py sdist
and will get deposited in the dist
directory which you could symlink to the <dir>
specified above.)
The second solution is to provide a proper package index somewhere. This allows you to refer to additional locations, and in particular, allows you to refer to something like
<a href="https://bitbucket.org/mforbes/persist/get/0.9.tar.bz2#egg=persist-0.9">persist-0.9</a>.
This uses the packaging feature at bitbucket that will checkout version 0.9 from the repository and deliver it as a tarball. The resulting index file can be hosted somewhere globally and referred to with pip install --extra-index-url <url>
.
Package Index API¶
The Package Index "API" is basically a set of nested directories of the form packagename/version
with index.html
files that will describe how to download the package with links like those shown above.
This works, but is a PITA since it seems to require:
- The whole nested directory structure (so you can't just manage a single file).
- Requires that these index files be server from a proper web server that will dish them up when given a trailing slash
https://.../packagename/versions/
. I still don't know why they can't just also look for the index files. I have asked on the mailing list.
Visualizing Dependencies¶
Pip¶
Try the pipdeptree package. Unfortunately, this lists all dependencies in a conda environment.
pip install pipdeptree
Conda¶
Here is a strategy based on Eric Dill's render_env.py GIST. We use it to find all of the roots in an environment (packages upon which nothing depends).
%%bash
pipdeptree | perl -nle'print if m{^\w+}'
from IPython.display import display, clear_output
import json, glob, sys
import os
from os.path import join, basename
# install this with "conda install -c conda-forge python-graphviz pygraphviz"
import graphviz as gv
import pygraphviz as pgv
import networkx as nx
# path to your conda environment
env_dir = os.path.dirname(sys.prefix)
env_name = os.path.basename(sys.prefix)
env_name = "cugpe"
path = os.path.join(env_dir, env_name)
path = os.path.dirname(env_dir)
path = "/data/apps/conda/envs/cugpe"
#path = "/data/apps/conda/envs/work"
#path = "/data/apps/conda/envs/blog"
#path = "/data/apps/conda/envs/hv_ok"
#path = "/data/apps/conda/envs/_gpe3"
dg = gv.Digraph(name=os.path.basename(path))
pdg = pgv.AGraph(strict=False, directed=True)
ng = nx.DiGraph()
for json_file in glob.glob(join(path, 'conda-meta', '*.json')):
print('reading', json_file)
j = json.load(open(json_file))
name = j['name']
label = "\n".join([j['name'], j['version']])
attrs = dict()
dg.node(name, label=label)
pdg.add_node(name)
ng.add_node(name)
for dep in j.get('depends', []):
_dep = dep.split(' ', 1)
dep_name = _dep.pop(0)
if _dep:
attrs = dict(label=_dep[0])
dg.edge(name, dep_name, **attrs)
pdg.add_edge(name, dep_name, **attrs)
ng.add_edge(name, dep_name, **attrs)
clear_output()
roots = sorted([(1+len(nx.descendants(ng, _n)), _n)
for _n in ng.nodes() if ng.in_degree(_n) == 0])
roots
roots = sorted([(1+len(nx.descendants(ng, _n)), _n)
for _n in ng.nodes() if ng.in_degree(_n) == 0])
roots
dg.render(view=True)
Issues¶
Conda Performance¶
Conda can be very slow to resolve dependencies. An experimental project mamba attempts to speed this up.
Redundancy between Conda and Pip¶
The present workflow duplicates information in setup.py
for pip
and PyPI, and meta.yaml
for Conda. This is being discussed as part of the following Conda issue:
Disk Usage¶
Multiple Conda environments can use a lot of disk space. Some of this can be recovered by cleaning unused packages:
conda clean --all
however, packages that are currently being used can take a lot of space. One way to check is to run ncdu
in the pkg
directory:
$ ncdu /data/apps/conda/pkg
--- /data/apps/conda/pkgs --------------------------------
581.6 MiB [##########] /itk-5.1.1-py38h32f6830_3
540.2 MiB [######### ] /mkl-2020.2-260
314.0 MiB [##### ] /qt-5.12.5-h514805e_3
313.5 MiB [##### ] /qt-5.12.5-h9272185_4
164.5 MiB [## ] /vtk-8.2.0-py38h3f69d5f_218
118.6 MiB [## ] /pandoc-2.10.1-haf1e3a3_0
115.2 MiB [# ] /pandoc-2.10-0
88.6 MiB [# ] /pandoc-2.11-h0dc7051_0
84.3 MiB [# ] /pandoc-2.11.0.4-h22f3db7_0
84.1 MiB [# ] /pandoc-2.11-h22f3db7_0
...
Here we see that there are two versions of qt
being used, and multiple versions of pandoc
. (This was after a bit of cleaning - previously I had several versions of itk
and mkl
as well.) To find out which environments are using which package, we can run the following:
function get_ver() {
envs=$(conda env list --json | jq -r '.envs[]')
for p in $envs; do
echo "========================>" $p
conda list -p $p | grep "\b$1\b"
done
}
get_ver qt
========================> /data/apps/conda
========================> /data/apps/conda/envs/alcc
========================> /data/apps/conda/envs/app_gitannex
========================> /data/apps/conda/envs/blog
========================> /data/apps/conda/envs/hg
========================> /data/apps/conda/envs/jupyter
qt 5.12.5 h9272185_4 conda-forge
========================> /data/apps/conda/envs/leo
========================> /data/apps/conda/envs/super_hydro
========================> /data/apps/conda/envs/work
qt 5.12.5 h514805e_3 conda-forge
I can then upgrade the work
and jupyter
environment to make sure it uses the same package, thereby saving space:
mamba update -n work qt
mamba update -n jupyter qt
Caveats:
- This might not always work if there are conflicts.
- This can break libraries with
numpy
andscipy
so test. -
To pin a specific version, you might need to install rather than update:
mamba install -n work qt=5.12.5=h9272185_4
After running conda clean --all
again we have:
$ conda clean --all
$ ncdu /data/apps/conda/pkgs
581.6 MiB [##########] /itk-5.1.1-py38h32f6830_3
540.2 MiB [######### ] /mkl-2020.2-260
319.6 MiB [##### ] /qt-5.12.9-h717870c_0
164.5 MiB [## ] /vtk-8.2.0-py38h3f69d5f_218
...
Here is my typical usage on Mac OS X after such cleaning:
14.6 GiB /data/apps/conda/
12.5 GiB [##########] /envs
10.0 GiB [##########] /work
2.3 GiB [## ] /jupyter
1.2 GiB [# ] /alcc
1.1 GiB [# ] /super_hydro
590.2 MiB [ ] /leo
396.9 MiB [ ] /blog
125.9 MiB [ ] /hg
115.9 MiB [ ] /app_gitannex
...
4.4 GiB [### ] /pkgs
548.1 MiB [ ] /conda-bld
410.7 MiB [ ] /lib
54.3 MiB [ ] /share
24.8 MiB [ ] /include
19.7 MiB [ ] /bin
7.3 MiB [ ] /conda-meta
4.3 MiB [ ] /python.app
...
Building a Conda Package¶
For a simple example of a Conda package, see the following:
This is a simple python project I forked consisting of a single python file. To turn it into a Conda package, I added the following meta.yaml
YAML file, built, and uploaded it as described in Anaconda Cloud: Building Packages:
package:
name: conda-tree
version: "0.0.1"
source:
git_rev: master
git_url: https://github.com/mforbes/conda-tree.git
requirements:
host:
- python
run:
- networkx
- conda
build:
noarch: python
script: # See https://github.com/conda/conda-build/issues/3166
- mkdir -p "$PREFIX/bin"
- cp conda-tree.py "$PREFIX/bin/"
about:
home: https://github.com/rvalieris/conda-tree
license: MIT
license_file: LICENSE
This follows a discussion about using conda-build to build a package without setup.py (since I wanted to minimally complicate the original package). Once this is built and committed, I can build it an uploaded to [my Conda channel'(https://anaconda.org/mforbes) as follows:
conda install conda-build anaconda-client # If needed: I have these in (base)
anaconda login # If needed: I stay logged in
conda build .
# The name needed below can be found with
conda build . --output
anaconda upload /data/apps/anaconda/conda-bld/noarch/conda-tree-0.0.1-0.tar.bz2
One gotcha: the conda build .
command above will attempt to download the source from the appropriate tag/branch on Github (since that is what we specified as a source). Thus, you must make sure to tag and push those tags:
git tag -a v0.0.1
git push v0.0.1
When a Dependency is not Available to Conda¶
When you are making conda packages, all elements need to be conda-installable. Here is how to push a pip-installable package to your conda channel:
mkdir tmp; cd tmp
conda activate base
conda skeleton pypi husl
# Edit husl/meta.yaml if needed
conda build husl/meta.yaml
anaconda upload --all /data/apps/conda/conda-bld/osx-64/husl-4.0.3-py37_0.tar.bz2
conda skeleton pypi python-hglib
# Edit python-hglib/meta.yaml if needed
conda build python-hglib/meta.yaml
anaconda upload --all /data/apps/conda/conda-bld/osx-64/python-hglib-2.6.1-py37_0.tar.bz2
References¶
Testing¶
By including a section like the following, you can run tests when you issue the conda build
command. Some important notes though:
- Conda now moves the working directory, so you need to specifically include the files you will test with the
source_files:
section. - If you build from your source directory, be sure to clean out
__pycache__
directories (see Gotchas below).
test:
source_files:
- persist
imports:
- persist
requires:
- coverage
- h5py
- pytest >=2.8.1
- pytest-cov >=2.2.0
- pytest-flake8
- scipy
commands:
- py.test
This will do several things:
- It will test that
import perists
works. - It will create a test environment with the specified requirements.
- It will run
py.test
.
If you do this locally (i.e. src: .
) then be sure to remove all __pycache__
, *.pyc
, and *.pyo
files:
find . -name "__pycache__" -exec rm -rf {} \;
find . -name "*.pyc" -delete
find . -name "*.pyo" -delete
Gotchas¶
-
Make sure you have the appropriate channels in your
~/.condarc
file. (There is currently no way to specify channels forconda build
.)channels: - defaults - conda-forge - mforbes - file://data/apps/conda/conda-bld
-
The last entry is because sometimes things are not immediately available on my
mforbes
channel, even though I upload something toanaconda.org
. This is where they are built by default. This local channel can be made available by running:conda index /data/apps/conda/conda-bld
-
Don't run
conda build
from a conda environment. Make sure you are in the base environment. -
If you do this locally (i.e.
src: .
) then be sure to remove all__pycache__
,*.pyc
, and*.pyo
files:find . -name "__pycache__" -exec rm -rf {} \; find . -name "*.pyc" -delete find . -name "*.pyo" -delete
Current Working Environments¶
I keep track of my working environments through a set of environment.*.yml
files which I host here:
Complete Install (21 July 2018)¶
Here is an example of a complete install on Mac OS X.
%%file environment.base.yml
channe
conda create -n work python=2
conda install -n work pandoc anaconda argcomplete beautiful-soup colorcet dill distribute mercurial pep8\
pyaudio pycrypto pympler pytest-runner sphinx_rtd_theme ujson cvxopt futures-compat\
lancet mkl-service pycurl bottleneck uncertainties mklfft mock scikit-learn twine weave\
snakeviz sympy pillow xarray pytest-cov pylint plotly vispy urllib3 numpydoc pytest-flake8\
line_profiler pygraphviz seaborn ipyparallel nbsphinx nbtutor paramnb nbdime jupyter\
jupyter_contrib_nbextensions
References¶
- Python Packaging User Guide: This is the authoritative guide about packaging with "Last Reviewed" dates so you can be sure you are getting the latest information.
- https://caremad.io/2013/07/setup-vs-requirement/
- http://www.dabapps.com/blog/introduction-to-pip-and-virtualenv-python/
- http://tech.marksblogg.com/better-python-package-management.html: A nice discussion of pip usage.
- https://www.colorado.edu/earthlab/2019/01/03/publishing-your-python-code-pip-and-conda-tips-and-best-practices: Some recent best practice notes.
Old Notes Etc.¶
TL:DR¶
- Install miniconda with perhaps a few tools like mercurial, but use
conda create -n <env>
to create working environments to help maintain isolation for everything else. -
Create the environments from an
environment.yml
file for reproducibility.-
Make sure that
pip
is configured to not install packages in the user directory so that they instead go to the conda environments:pip config set install.user false
-
This setup requires activating the appropriate conda environment before working. There are three options for making commands available to all environments:
- Added the base miniconda
bin/
at the end of your path in a special way so that special way so that you can fall back to these base commands. I do this for mercurial which I install in my base environment. - Explicitly use
pip install --user
which will put files in~/.locals/bin
and~/.locals/lib
so that any current python environment can access them. I do this with Nikola for blogging. - Create symlinks to conda-installed executables in
~/.locals/bin
or aliases to the appropriate environment. I use this, for example, with Jupyter.