Python's package management is a mess.I'm involved in a few open source projects and Ioften help users address their environment & installation issues.A large number of these environment issues essentially come down toincorrectly / accidentally mixing multiple different python environment together.This post lists a few common pitfalls and misconceptions of such.
People often unfortunately have multiple python binaries and multiple installations of python pckages, e.g.:
/usr/bin/python
, /usr/lib/python3.7/*
setup.py
or system's pip install
to install new packages to different locations./usr/local/lib/python3.7/*
, $HOME/.local/lib/python3.7/*
pip install
etc., under a virualenv, can install to a location under the virtualenv.$HOME/my_venv/bin/python
, $HOME/my_venv/lib/python3.7/*
$HOME/anaconda3/bin/python
, $HOME/anaconda3/lib/python3.7/*
To install a library, all of the above methods are very common.As a result, many python developer's machines have multiple environments.A ton of problems can arise from this.
For reasons above, you could have multiple installationsof the same package in your system. It often causes very confusing issuesif you think you're using one installation, but is actually using a different one.Examples of such issues include:
When such issues appear, remember to verifywhat/where is the library you're using.When in doubt, try the following methods:
Use import lib; print(lib.__version__)
to know the version of library you're using.However not all packages have the __version__
attribute. It could also be VERSION
, etc.
Use import lib; print(lib.__file__)
to know the location of library you're using.This method should work for most packages.
Use strace -fe file python -c 'import lib; do_something_with_lib()'
to see every fileused by the command. This tells you everything needed to figure out whetheryou have the issue of multiple installation.
I have the following command line alias to help me check libraries:
|
pip list
or conda list
to check package versionThe version you see in these two commands may not match what you're actuallyusing, because there could be multiple versions of the samelibrary in the system installed by pip
or conda
or other methods.Neither pip
nor conda
is able to know all of them.
To tell precisely the version of a library you're using, follow suggestions above.
setup.py install
to install packagesUsually, a package installed in this way is not managed by any system:no command can tell you it is installed; no command can uninstall it for you.A pip uninstall
for such packages may complain that it "cannot determine which files belong to it", or it may just donothing.You often need to manually remove files to really uninstall it.
The result is that, when you need to install a different version of it some day in the future, usingother methods (e.g. pip
or conda
), it either fails, or succeeds but give you a system of multiple installations.
python -m pip
over pip
There could be multiple python
binaries in your system(e.g., from system, venv, anaconda).However, pip
is just a python script: based on how its shebang line is written,some versions of pip
pick the python
executable from your $PATH
,but some versions of pip
have hard-coded absolute path to the python
executable it will use.
As a result, when you run pip install
directly,it's not immediately clear which python it will use, let alone where the library will be installed.
On an environment with more than one python
,always use python -m pip
or /some/python -m pip
, instead of the pip
command directly.
pip uninstall
multiple timesIf you want to uninstall something, uninstall it multiple times until it converges.pip can install one package multiple timesin different locations (e.g., one inside virutalenv/conda + one in $HOME/.local
).
python -c 'import lib'
to confirm uninstallationNot everything can be uninstalled with a simple pip uninstall
or conda.Examples are :
pip
.setup.py install
.PYTHONPATH
.import lib
may be provided by multiple alternative packages.For example, tf-nightly
and tensorflow
package both provide import tensorflow
.It's easy to forget if you've installed both.As a result, always use import lib
to confirm after you uninstall something.If you're surprised by the successful import, use methods in this article to tell where they are.
Large, complicated dependencies such as OpenCV, PyTorch, TensorFlow often can be installed in manydifferent ways, only some of them are valid to certain environments.Such dependencies should NOT be declared in setup.py
/ requirements.txt
to be automatically installed.To avoid invalid installation or multiple installation, the choice of how to install thesedependencies should be left with users.
Unfortunately, 10k+ projects declares opencv-python
as a dependency.As a result, their users will automatically install and use the desktop version opencv-python
, instead of:
opencv-python-contrib
, with more featuresopencv-python-headless
, with fewer features and fewer compatibility issuesIn fact, opencv-python
has given suggestionson how to select the right package. "Automatic" selection is simply wrong.Similarly, a project that declares dependency on PyTorch may automatically install one with mismatched CUDA version.
You can sometimes have a python library installed already,but you also have its raw source code somewhere in your system.This is another potential case of multiple installation.
If you execute import libA
in the source directory, python may find a local directorycalled libA
which contains the source code, and use this source code, rather than thelibA
that's actually installed in a different location.
In addition to the common confusions that can arise from multiple installations,such situation often cause errors, because source code is often an invalid installation itself.In many libraries, the raw source code is different from what actually gets installedafter you run pip install
.The most common example is that compiled extensions will not exist in source code.As a result, using a python library in its source code directory often leads to errors.
The issue is so common that some libraries try todetect and educate the users (e.g.,numpy here andtensorflow here) about it.
The situation where it is OK to use a source directory includes:
pip install --editable
.Never use sudo pip install
or sudo python setup.py
,unless it's a virtual system (e.g. docker) that you don't intend to keep long.Because:
pip install --user
can install libraries without root permission (installed to $HOME/.local
on linux).This option is sometimes default in latest pip
.Or you can use venv if stronger isolation is needed.Now venv is officially part of Python 3.
You only need root permission when the library directly interacts with hardware.e.g., you need root permission to install nvidia driver.
You do not need root permission to,e.g., install a different version of Python, GCC, or CUDA(though a newer CUDA sometimes requires newer driver).But doing these without root permission certainly requires some extra knowledge.
Python itself is a binary, that depends on some other binary libraries.Each python package may also contain binaries or depend on other binary libraries.Mixing binaries built from different sources (e.g. your system package manager v.s. anaconda)together (i.e. into a single process) has potential binary compatibility issues.
Such issues can happen when you want to use libA
and libB
together, butthey are built using different versions of another library libC
,or built with different C++ compilers.(C compiler, however, should produce binary compatible code across compiler versions).
Ideally you might expect some mechanism to avoid such conflicts. There is indeed a compilcated set of symbolvisibility & compiler ABI rules, but most libraries are not following them correctly.The result of such incompatibility issues is often a segfault or other mysterious errors.
In reality, here are how packages are built:
Your OS's package manager (apt/yum/pacman
, etc) installs many binaries and libs. They arebuilt with the exact system packages they depend on, using the exact compiler installed by the package manager.They are all built in a nice uniform environment that will not have any compatibility issues: allthese packages can be mixed together.
When you pip install
a package, there are two possiblities:
Source distribution: this command compiles source code, using whatever compiler & dependency libraries it finds.So its compatibility will depend on which compiler & libraries it finds.Typically this is controlled by standard environment variables such as $CC
,$LIBRARY_PATH
, butit varies among packages.
Binary wheel distribution: this command downloads a pre-built binary. Thismeans that you need to confirm the binary is built in an environment that's compatible with otherpackages you're using.
Lots of binary packages on pypi contain the word "manylinux": it means the package is builtsuch that it's supposed to be compatible with most linux environment.Typically, using a manylinux package should not lead to compatibility issues.Although there are exceptions (e.g., some packages incorrectly mark themselves as manylinux).Also, a manylinux package may have suboptimal performance due to the compatibility requirements:they are often built with old version of compilers and old instruction set.
For other packages without the "manylinux" signature, you can only wish for good luck. They usually work fine but couldstop working at any day.There are a number of github issues in different projects about "import libA causes import libB to crash".Typically these are giant projects, such as OpenCV, TensorFlow, PyTorch.
When you conda install
a package that contains binaries, it's always pre-built.The official packages are built in anaconda's standard environment, and all the runtime dependencies in that standardenvironment are also packaged and distributed by anaconda.Anaconda provides a (almost) full runtime environment: including essential libs such as libstdc++
and libgcc
.This means that the conda world is just like your OS's package manager:if you use conda to install all libraries (and their dependencies), they are always compatiblewith each other.
That sounds nice, until you want to build a package by yourself.Anaconda provides a full runtime environment, but usually not the build-time environment.Normally you'll still be building the package using your system's compiler & libraries (or those defined by your envvars).
As long as you use python
from conda,you'll almost always run inside conda's runtime environment, using libstdc++
, libjpeg
, etc fromanaconda/lib
.It's then possible that the package you build is not compatible with conda's runtime environment.
I've frequently seen such failures, e.g.:
conda install cudatoolkit=10.1 pytorch
gives you a working pytorch in cuda10.1 runtime.It works fine until you build a custom cuda extension: the extension will use nvcc
from yoursystem which may not be 10.1.That's why I personally avoid conda and use system's python whenever possible.