6. Revise tooling for Python dependency management¶
Date: 2022-02-25
Status¶
Superseded by 0007, but the context in this ADR is still useful
Context¶
At the moment of revisiting our dependency-management approach, Bedrock’s Python dependencies were installed from a hand-cut requirements/*.txt
files which (sensibly) included hashes so that we could be sure about what our Python package installer, pip
, was actually installing.
However, this process was onerous:
We had a number of requirements files,
base
,prod
,dev
,migration
(no longer required but still being processed at installation time) anddocs
- all of which had to be hand-maintained.Hashes needed to be generated when adding/updating a dependency. This was done with a specific tool
hashin
and needed to be done for each requirement.When
pip
detects hashes in a requirements file, it automatically requires hashes for all packages it installs, including subdependencies of dependecies mentioned inrequirements/*.txt
. This in turn meant that adding or updating a new dep often required hashing-in one or more subdeps – and at worst, a change or niggle withpip
would result in a new subdep being implicitly required, which would then fail to install because it was not hashed in to the requirements file.
Other projects (both within MEAO and across Mozilla) used more sophisticated dependency management tools, including:
pip-tools
- which draws reqs from an input file and generates a requirements.txt complete with hashespip-compile-multi
- which extends pip-tools’ behaviour to support multiple output files and shared input filespoetry
- which combines a lockfile approach with a standalone virtual environmentpipenv
- which similarly combines a lockfile with a virtual environmentconda
- a language-agnostic package manager and environment management systemsimply
pip
The ideal solution would support all of the following:
Simple input file format/syntax
Ability to pin dependencies
Support for installing with hash-checking of packages
Automatic hashing of requirements, rather than having to manually do it with
hashin
et al.Support for multiple build configurations (eg prod, dev, docs)
Dependabot compatibility, so we still get alerts and updates
An unopinionated approach to virtualenvs – can work with and without them, so that developers can use the virtualenv tooling they prefer and we don’t have to use a virtualenv in our containers if we don’t want to
Sufficiently active maintenance of the project
Use/knowledge of the tooling elsewhere in the broader organisation
Decision¶
After evaluating the above, including pip-tools
, pip-compile-multi
and poetry
in greater depth, pip-compile-multi
was selected.
Significant factors were how allows us to pin our top-level dependencies in a clutter-free input format, supports inheritance between files and miltiple output files with ease, and it automatically generates hashes for subdependencies.
Consequences¶
pip-compile-multi
has been easily integrated into the Bedrock workflow, but there is one non-trivial downside: Github’s Dependabot service does not play well with the combination of multiple requirements files and inheritance between them. As such, does not currently produce reliable updates (either partial updates or some requirements files seem to be ignored entirely). See https://github.com/dependabot/dependabot-core/issues/536
Strictly, though, we don’t need the convenience of Dependabot - we have a make
command to identify stale deps and recompiling is another, single, make
command. Also, we’re more likely to compile a bunch of Dependabot PRs into one changeset (eg with paul-mclendahand
), than to merge them straight to master
/main
one at at time. As long as we’re getting Github security alerts for vulnerable dependencies, we’ll be OK.
That said, if we did find we needed Dependabot compatibility, pip-tools
and some extra legwork in the Makefile to deal with prod, dev and docs deps separately would likely be a viable alternative.