Finding and choosing files (index.py
and PackageFinder
)¶
The index.py
module is a top-level module in pip responsible for deciding
what file to download and from where, given a requirement for a project. The
module’s functionality is largely exposed through and coordinated by the
module’s PackageFinder
class.
Overview¶
Here is a rough description of the process that pip uses to choose what file to download for a package, given a requirement:
Access the various network and file system locations configured for pip that contain package files. These locations can include, for example, pip’s --index-url (with default https://pypi.org/simple/ ) and any configured --extra-index-url locations. Each of these locations is a PEP 503 “simple repository” page, which is an HTML page of anchor links.
Collect together all of the links (e.g. by parsing the anchor links from the HTML pages) and create
Link
objects from each of these. The LinkCollector class is responsible for both this step and the previous.Determine which of the links are minimally relevant, using the LinkEvaluator class. Create an
InstallationCandidate
object (aka candidate for install) for each of these relevant links.Further filter the collection of
InstallationCandidate
objects (using the CandidateEvaluator class) to a collection of “applicable” candidates.If there are applicable candidates, choose the best candidate by sorting them (again using the CandidateEvaluator class).
The remainder of this section is organized by documenting some of the
classes inside index.py
, in the following order:
the main PackageFinder class,
the LinkCollector class,
the LinkEvaluator class,
the CandidateEvaluator class,
the CandidatePreferences class, and
the BestCandidateResult class.
The PackageFinder
class¶
The PackageFinder
class is the primary way through which code in pip
interacts with index.py
. It is an umbrella class that encapsulates and
groups together various package-finding functionality.
The PackageFinder
class is responsible for searching the network and file
system for what versions of a package pip can install, and also for deciding
which version is most preferred, given the user’s preferences, target Python
environment, etc.
The pip commands that use the PackageFinder
class are:
The pip commands requiring use of the PackageFinder
class generally
instantiate PackageFinder
only once for the whole pip invocation. In
fact, pip creates this PackageFinder
instance when command options
are first parsed.
With the excepton of pip list, each of the above commands is
implemented as a Command
class inheriting from RequirementCommand
(for example pip download is implemented by DownloadCommand
), and
the PackageFinder
instance is created by calling the
RequirementCommand
class’s _build_package_finder()
method. pip
list
, on the other hand, constructs its PackageFinder
instance by
calling the ListCommand
class’s _build_package_finder()
. (This
difference may simply be historical and may not actually be necessary.)
Each of these commands also uses the PackageFinder
class for pip’s
“self-check,” (i.e. to check whether a pip upgrade is available). In this
case, the PackageFinder
instance is created by the
self_outdated_check.py
module’s pip_self_version_check()
function.
The PackageFinder
class is responsible for doing all of the things listed
in the Overview section like fetching and parsing
PEP 503 simple repository HTML pages, evaluating which links in the simple
repository pages are relevant for each requirement, and further filtering and
sorting by preference the candidates for install coming from the relevant
links.
One of PackageFinder
’s main top-level methods is
find_best_candidate()
. This method does the following two things:
Calls its
find_all_candidates()
method, which gathers all possible package links by reading and parsing the index URL’s and locations provided by the user (the LinkCollector class’scollect_links()
method), constructs a LinkEvaluator object to filter out some of those links, and then returns a list ofInstallationCandidates
(aka candidates for install). This corresponds to steps 1-3 of the Overview above.Constructs a
CandidateEvaluator
object and uses that to determine the best candidate. It does this by calling theCandidateEvaluator
class’scompute_best_candidate()
method on the return value offind_all_candidates()
. This corresponds to steps 4-5 of the Overview.
The LinkCollector
class¶
The LinkCollector class is the class
responsible for collecting the raw list of “links” to package files
(represented as Link
objects). An instance of the class accesses the
various PEP 503 HTML “simple repository” pages, parses their HTML,
extracts the links from the anchor elements, and creates Link
objects
from that information. The LinkCollector
class is “unintelligent” in that
it doesn’t do any evaluation of whether the links are relevant to the
original requirement; it just collects them.
The LinkCollector
class takes into account the user’s --find-links, --extra-index-url, and related
options when deciding which locations to collect links from. The class’s main
method is the collect_links()
method. The PackageFinder class invokes this method as the first step of its
find_all_candidates()
method.
The LinkCollector
class is the only class in the index.py
module that
makes network requests and is the only class in the module that depends
directly on PipSession
, which stores pip’s configuration options and
state for making requests.
The LinkEvaluator
class¶
The LinkEvaluator
class contains the business logic for determining
whether a link (e.g. in a simple repository page) satisfies minimal
conditions to be a candidate for install (resulting in an
InstallationCandidate
object). When making this determination, the
LinkEvaluator
instance uses information like the target Python
interpreter as well as user preferences like whether binary files are
allowed or preferred, etc.
Specifically, the LinkEvaluator
class has an evaluate_link()
method
that returns whether a link is a candidate for install.
Instances of this class are created by the PackageFinder
class’s
make_link_evaluator()
on a per-requirement basis.
The CandidateEvaluator
class¶
The CandidateEvaluator
class contains the business logic for evaluating
which InstallationCandidate
objects should be preferred. This can be
viewed as a determination that is finer-grained than that performed by the
LinkEvaluator
class.
In particular, the CandidateEvaluator
class uses the whole set of
InstallationCandidate
objects when making its determinations, as opposed
to evaluating each candidate in isolation, as LinkEvaluator
does. For
example, whether a pre-release is eligible for selection or whether a file
whose hash doesn’t match is eligible depends on properties of the collection
as a whole.
The CandidateEvaluator
class uses information like the list of PEP 425
tags compatible with the target Python interpreter, hashes provided by the
user, and other user preferences, etc.
Specifically, the class has a get_applicable_candidates()
method.
This accepts the InstallationCandidate
objects resulting from the links
accepted by the LinkEvaluator
class’s evaluate_link()
method, and
it further filters them to a list of “applicable” candidates.
The CandidateEvaluator
class also has a sort_best_candidate()
method
that orders the applicable candidates by preference, and then returns the
best (i.e. most preferred).
Finally, the class has a compute_best_candidate()
method that calls
get_applicable_candidates()
followed by sort_best_candidate()
, and
then returning a BestCandidateResult
object encapsulating both the intermediate and final results of the decision.
Instances of CandidateEvaluator
are created by the PackageFinder
class’s make_candidate_evaluator()
method on a per-requirement basis.
The CandidatePreferences
class¶
The CandidatePreferences
class is a simple container class that groups
together some of the user preferences that PackageFinder
uses to
construct CandidateEvaluator
objects (via the PackageFinder
class’s
make_candidate_evaluator()
method).
A PackageFinder
instance has a _candidate_prefs
attribute whose value
is a CandidatePreferences
instance. Since PackageFinder
has a number
of responsibilities and options that control its behavior, grouping the
preferences specific to CandidateEvaluator
helps maintainers know which
attributes are needed only for CandidateEvaluator
.
The BestCandidateResult
class¶
The BestCandidateResult
class is a convenience “container” class that
encapsulates the result of finding the best candidate for a requirement.
(By “container” we mean an object that simply contains data and has no
business logic or state-changing methods of its own.) It stores not just the
final result but also intermediate values used to determine the result.
The class is the return type of both the CandidateEvaluator
class’s
compute_best_candidate()
method and the PackageFinder
class’s
find_best_candidate()
method.