Relationship Loading Techniques¶
A big part of SQLAlchemy is providing a wide range of control over how related
objects get loaded when querying. By “related objects” we refer to collections
or scalar associations configured on a mapper using relationship()
.
This behavior can be configured at mapper construction time using the
relationship.lazy
parameter to the relationship()
function, as well as by using options with the Query
object.
The loading of relationships falls into three categories; lazy loading, eager loading, and no loading. Lazy loading refers to objects are returned from a query without the related objects loaded at first. When the given collection or reference is first accessed on a particular object, an additional SELECT statement is emitted such that the requested collection is loaded.
Eager loading refers to objects returned from a query with the related
collection or scalar reference already loaded up front. The Query
achieves this either by augmenting the SELECT statement it would normally
emit with a JOIN to load in related rows simultaneously, or by emitting
additional SELECT statements after the primary one to load collections
or scalar references at once.
“No” loading refers to the disabling of loading on a given relationship, either that the attribute is empty and is just never loaded, or that it raises an error when it is accessed, in order to guard against unwanted lazy loads.
The primary forms of relationship loading are:
lazy loading - available via
lazy='select'
or thelazyload()
option, this is the form of loading that emits a SELECT statement at attribute access time to lazily load a related reference on a single object at a time. Lazy loading is detailed at Lazy Loading.joined loading - available via
lazy='joined'
or thejoinedload()
option, this form of loading applies a JOIN to the given SELECT statement so that related rows are loaded in the same result set. Joined eager loading is detailed at Joined Eager Loading.subquery loading - available via
lazy='subquery'
or thesubqueryload()
option, this form of loading emits a second SELECT statement which re-states the original query embedded inside of a subquery, then JOINs that subquery to the related table to be loaded to load all members of related collections / scalar references at once. Subquery eager loading is detailed at Subquery Eager Loading.select IN loading - available via
lazy='selectin'
or theselectinload()
option, this form of loading emits a second (or more) SELECT statement which assembles the primary key identifiers of the parent objects into an IN clause, so that all members of related collections / scalar references are loaded at once by primary key. Select IN loading is detailed at Select IN loading.raise loading - available via
lazy='raise'
,lazy='raise_on_sql'
, or theraiseload()
option, this form of loading is triggered at the same time a lazy load would normally occur, except it raises an ORM exception in order to guard against the application making unwanted lazy loads. An introduction to raise loading is at Preventing unwanted lazy loads using raiseload.no loading - available via
lazy='noload'
, or thenoload()
option; this loading style turns the attribute into an empty attribute that will never load or have any loading effect. “noload” is a fairly uncommon loader option.
Configuring Loader Strategies at Mapping Time¶
The loader strategy for a particular relationship can be configured
at mapping time to take place in all cases where an object of the mapped
type is loaded, in the absence of any query-level options that modify it.
This is configured using the relationship.lazy
parameter to
relationship()
; common values for this parameter
include select
, joined
, subquery
and selectin
.
For example, to configure a relationship to use joined eager loading when the parent object is queried:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", lazy='joined')
Above, whenever a collection of Parent
objects are loaded, each
Parent
will also have its children
collection populated, using
rows fetched by adding a JOIN to the query for Parent
objects.
See Joined Eager Loading for background on this style of loading.
The default value of the relationship.lazy
argument is
"select"
, which indicates lazy loading. See Lazy Loading for
further background.
Relationship Loading with Loader Options¶
The other, and possibly more common way to configure loading strategies
is to set them up on a per-query basis against specific attributes using the
Query.options()
method. Very detailed
control over relationship loading is available using loader options;
the most common are
joinedload()
,
subqueryload()
, selectinload()
and lazyload()
. The option accepts either
the string name of an attribute against a parent, or for greater specificity
can accommodate a class-bound attribute directly:
# set children to load lazily
session.query(Parent).options(lazyload('children')).all()
# same, using class-bound attribute
session.query(Parent).options(lazyload(Parent.children)).all()
# set children to load eagerly with a join
session.query(Parent).options(joinedload('children')).all()
The loader options can also be “chained” using method chaining to specify how loading should occur further levels deep:
session.query(Parent).options(
joinedload(Parent.children).
subqueryload(Child.subelements)).all()
Chained loader options can be applied against a “lazy” loaded collection. This means that when a collection or association is lazily loaded upon access, the specified option will then take effect:
session.query(Parent).options(
lazyload(Parent.children).
subqueryload(Child.subelements)).all()
Above, the query will return Parent
objects without the children
collections loaded. When the children
collection on a particular
Parent
object is first accessed, it will lazy load the related
objects, but additionally apply eager loading to the subelements
collection on each member of children
.
Using method chaining, the loader style of each link in the path is explicitly
stated. To navigate along a path without changing the existing loader style
of a particular attribute, the defaultload()
method/function may be used:
session.query(A).options(
defaultload(A.atob).
joinedload(B.btoc)).all()
A similar approach can be used to specify multiple sub-options at once, using
the Load.options()
method:
session.query(A).options(
defaultload(A.atob).options(
joinedload(B.btoc),
joinedload(B.btod)
)).all()
New in version 1.3.6: added Load.options()
See also
Deferred Loading across Multiple Entities - illustrates examples of combining relationship and column-oriented loader options.
Note
The loader options applied to an object’s lazy-loaded collections are “sticky” to specific object instances, meaning they will persist upon collections loaded by that specific object for as long as it exists in memory. For example, given the previous example:
session.query(Parent).options(
lazyload(Parent.children).
subqueryload(Child.subelements)).all()
if the children
collection on a particular Parent
object loaded by
the above query is expired (such as when a Session
object’s
transaction is committed or rolled back, or Session.expire_all()
is
used), when the Parent.children
collection is next accessed in order to
re-load it, the Child.subelements
collection will again be loaded using
subquery eager loading.This stays the case even if the above Parent
object is accessed from a subsequent query that specifies a different set of
options.To change the options on an existing object without expunging it and
re-loading, they must be set explicitly in conjunction with the
Query.populate_existing()
method:
# change the options on Parent objects that were already loaded
session.query(Parent).populate_existing().options(
lazyload(Parent.children).
lazyload(Child.subelements)).all()
If the objects loaded above are fully cleared from the Session
,
such as due to garbage collection or that Session.expunge_all()
were used, the “sticky” options will also be gone and the newly created
objects will make use of new options if loaded again.
A future SQLAlchemy release may add more alternatives to manipulating the loader options on already-loaded objects.
Lazy Loading¶
By default, all inter-object relationships are lazy loading. The scalar or
collection attribute associated with a relationship()
contains a trigger which fires the first time the attribute is accessed. This
trigger typically issues a SQL call at the point of access
in order to load the related object or objects:
>>> jack.addresses
SELECT
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id
FROM addresses
WHERE ? = addresses.user_id
[5]
[<Address(u'jack@google.com')>, <Address(u'j25@yahoo.com')>]
The one case where SQL is not emitted is for a simple many-to-one relationship, when
the related object can be identified by its primary key alone and that object is already
present in the current Session
. For this reason, while lazy loading
can be expensive for related collections, in the case that one is loading
lots of objects with simple many-to-ones against a relatively small set of
possible target objects, lazy loading may be able to refer to these objects locally
without emitting as many SELECT statements as there are parent objects.
This default behavior of “load upon attribute access” is known as “lazy” or “select” loading - the name “select” because a “SELECT” statement is typically emitted when the attribute is first accessed.
Lazy loading can be enabled for a given attribute that is normally
configured in some other way using the lazyload()
loader option:
from sqlalchemy.orm import lazyload
# force lazy loading for an attribute that is set to
# load some other way normally
session.query(User).options(lazyload(User.addresses))
Preventing unwanted lazy loads using raiseload¶
The lazyload()
strategy produces an effect that is one of the most
common issues referred to in object relational mapping; the
N plus one problem, which states that for any N objects loaded,
accessing their lazy-loaded attributes means there will be N+1 SELECT
statements emitted. In SQLAlchemy, the usual mitigation for the N+1 problem
is to make use of its very capable eager load system. However, eager loading
requires that the attributes which are to be loaded be specified with the
Query
up front. The problem of code that may access other attributes
that were not eagerly loaded, where lazy loading is not desired, may be
addressed using the raiseload()
strategy; this loader strategy
replaces the behavior of lazy loading with an informative error being
raised:
from sqlalchemy.orm import raiseload
session.query(User).options(raiseload(User.addresses))
Above, a User
object loaded from the above query will not have
the .addresses
collection loaded; if some code later on attempts to
access this attribute, an ORM exception is raised.
raiseload()
may be used with a so-called “wildcard” specifier to
indicate that all relationships should use this strategy. For example,
to set up only one attribute as eager loading, and all the rest as raise:
session.query(Order).options(
joinedload(Order.items), raiseload('*'))
The above wildcard will apply to all relationships not just on Order
besides items
, but all those on the Item
objects as well. To set up
raiseload()
for only the Order
objects, specify a full
path with Load
:
from sqlalchemy.orm import Load
session.query(Order).options(
joinedload(Order.items), Load(Order).raiseload('*'))
Conversely, to set up the raise for just the Item
objects:
session.query(Order).options(
joinedload(Order.items).raiseload('*'))
See also
Joined Eager Loading¶
Joined eager loading is the most fundamental style of eager loading in the
ORM. It works by connecting a JOIN (by default
a LEFT OUTER join) to the SELECT statement emitted by a Query
and populates the target scalar/collection from the
same result set as that of the parent.
At the mapping level, this looks like:
class Address(Base):
# ...
user = relationship(User, lazy="joined")
Joined eager loading is usually applied as an option to a query, rather than
as a default loading option on the mapping, in particular when used for
collections rather than many-to-one-references. This is achieved
using the joinedload()
loader option:
>>> jack = session.query(User).\
... options(joinedload(User.addresses)).\
... filter_by(name='jack').all()
SELECT
addresses_1.id AS addresses_1_id,
addresses_1.email_address AS addresses_1_email_address,
addresses_1.user_id AS addresses_1_user_id,
users.id AS users_id, users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
LEFT OUTER JOIN addresses AS addresses_1
ON users.id = addresses_1.user_id
WHERE users.name = ?
['jack']
The JOIN emitted by default is a LEFT OUTER JOIN, to allow for a lead object
that does not refer to a related row. For an attribute that is guaranteed
to have an element, such as a many-to-one
reference to a related object where the referencing foreign key is NOT NULL,
the query can be made more efficient by using an inner join; this is available
at the mapping level via the relationship.innerjoin
flag:
class Address(Base):
# ...
user_id = Column(ForeignKey('users.id'), nullable=False)
user = relationship(User, lazy="joined", innerjoin=True)
At the query option level, via the joinedload.innerjoin
flag:
session.query(Address).options(
joinedload(Address.user, innerjoin=True))
The JOIN will right-nest itself when applied in a chain that includes an OUTER JOIN:
>>> session.query(User).options(
... joinedload(User.addresses).
... joinedload(Address.widgets, innerjoin=True)).all()
SELECT
widgets_1.id AS widgets_1_id,
widgets_1.name AS widgets_1_name,
addresses_1.id AS addresses_1_id,
addresses_1.email_address AS addresses_1_email_address,
addresses_1.user_id AS addresses_1_user_id,
users.id AS users_id, users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
LEFT OUTER JOIN (
addresses AS addresses_1 JOIN widgets AS widgets_1 ON
addresses_1.widget_id = widgets_1.id
) ON users.id = addresses_1.user_id
On older versions of SQLite, the above nested right JOIN may be re-rendered as a nested subquery. Older versions of SQLAlchemy would convert right-nested joins into subqueries in all cases.
Joined eager loading and result set batching¶
A central concept of joined eager loading when applied to collections is that
the Query
object must de-duplicate rows against the leading
entity being queried. Such as above,
if the User
object we loaded referred to three Address
objects, the
result of the SQL statement would have had three rows; yet the Query
returns only one User
object. As additional rows are received for a
User
object just loaded in a previous row, the additional columns that
refer to new Address
objects are directed into additional results within
the User.addresses
collection of that particular object.
This process is very transparent, however does imply that joined eager
loading is incompatible with “batched” query results, provided by the
Query.yield_per()
method, when used for collection loading. Joined
eager loading used for scalar references is however compatible with
Query.yield_per()
. The Query.yield_per()
method will result
in an exception thrown if a collection based joined eager loader is
in play.
To “batch” queries with arbitrarily large sets of result data while maintaining
compatibility with collection-based joined eager loading, emit multiple
SELECT statements, each referring to a subset of rows using the WHERE
clause, e.g. windowing. Alternatively, consider using “select IN” eager loading
which is potentially compatible with Query.yield_per()
, provided
that the database driver in use supports multiple, simultaneous cursors
(SQLite, PostgreSQL drivers, not MySQL drivers or SQL Server ODBC drivers).
The Zen of Joined Eager Loading¶
Since joined eager loading seems to have many resemblances to the use of
Query.join()
, it often produces confusion as to when and how it should
be used. It is critical to understand the distinction that while
Query.join()
is used to alter the results of a query, joinedload()
goes through great lengths to not alter the results of the query, and
instead hide the effects of the rendered join to only allow for related objects
to be present.
The philosophy behind loader strategies is that any set of loading schemes can
be applied to a particular query, and the results don’t change - only the
number of SQL statements required to fully load related objects and collections
changes. A particular query might start out using all lazy loads. After using
it in context, it might be revealed that particular attributes or collections
are always accessed, and that it would be more efficient to change the loader
strategy for these. The strategy can be changed with no other modifications
to the query, the results will remain identical, but fewer SQL statements would
be emitted. In theory (and pretty much in practice), nothing you can do to the
Query
would make it load a different set of primary or related
objects based on a change in loader strategy.
How joinedload()
in particular achieves this result of not impacting
entity rows returned in any way is that it creates an anonymous alias of the
joins it adds to your query, so that they can’t be referenced by other parts of
the query. For example, the query below uses joinedload()
to create a
LEFT OUTER JOIN from users
to addresses
, however the ORDER BY
added
against Address.email_address
is not valid - the Address
entity is not
named in the query:
>>> jack = session.query(User).\
... options(joinedload(User.addresses)).\
... filter(User.name=='jack').\
... order_by(Address.email_address).all()
SELECT
addresses_1.id AS addresses_1_id,
addresses_1.email_address AS addresses_1_email_address,
addresses_1.user_id AS addresses_1_user_id,
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
LEFT OUTER JOIN addresses AS addresses_1
ON users.id = addresses_1.user_id
WHERE users.name = ?
ORDER BY addresses.email_address <-- this part is wrong !
['jack']
Above, ORDER BY addresses.email_address
is not valid since addresses
is not in the
FROM list. The correct way to load the User
records and order by email
address is to use Query.join()
:
>>> jack = session.query(User).\
... join(User.addresses).\
... filter(User.name=='jack').\
... order_by(Address.email_address).all()
SELECT
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
JOIN addresses ON users.id = addresses.user_id
WHERE users.name = ?
ORDER BY addresses.email_address
['jack']
The statement above is of course not the same as the previous one, in that the
columns from addresses
are not included in the result at all. We can add
joinedload()
back in, so that there are two joins - one is that which we
are ordering on, the other is used anonymously to load the contents of the
User.addresses
collection:
>>> jack = session.query(User).\
... join(User.addresses).\
... options(joinedload(User.addresses)).\
... filter(User.name=='jack').\
... order_by(Address.email_address).all()
SELECT
addresses_1.id AS addresses_1_id,
addresses_1.email_address AS addresses_1_email_address,
addresses_1.user_id AS addresses_1_user_id,
users.id AS users_id, users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users JOIN addresses
ON users.id = addresses.user_id
LEFT OUTER JOIN addresses AS addresses_1
ON users.id = addresses_1.user_id
WHERE users.name = ?
ORDER BY addresses.email_address
['jack']
What we see above is that our usage of Query.join()
is to supply JOIN
clauses we’d like to use in subsequent query criterion, whereas our usage of
joinedload()
only concerns itself with the loading of the
User.addresses
collection, for each User
in the result. In this case,
the two joins most probably appear redundant - which they are. If we wanted to
use just one JOIN for collection loading as well as ordering, we use the
contains_eager()
option, described in Routing Explicit Joins/Statements into Eagerly Loaded Collections below. But
to see why joinedload()
does what it does, consider if we were
filtering on a particular Address
:
>>> jack = session.query(User).\
... join(User.addresses).\
... options(joinedload(User.addresses)).\
... filter(User.name=='jack').\
... filter(Address.email_address=='someaddress@foo.com').\
... all()
SELECT
addresses_1.id AS addresses_1_id,
addresses_1.email_address AS addresses_1_email_address,
addresses_1.user_id AS addresses_1_user_id,
users.id AS users_id, users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users JOIN addresses
ON users.id = addresses.user_id
LEFT OUTER JOIN addresses AS addresses_1
ON users.id = addresses_1.user_id
WHERE users.name = ? AND addresses.email_address = ?
['jack', 'someaddress@foo.com']
Above, we can see that the two JOINs have very different roles. One will match
exactly one row, that of the join of User
and Address
where
Address.email_address=='someaddress@foo.com'
. The other LEFT OUTER JOIN
will match all Address
rows related to User
, and is only used to
populate the User.addresses
collection, for those User
objects that are
returned.
By changing the usage of joinedload()
to another style of loading, we
can change how the collection is loaded completely independently of SQL used to
retrieve the actual User
rows we want. Below we change joinedload()
into subqueryload()
:
>>> jack = session.query(User).\
... join(User.addresses).\
... options(subqueryload(User.addresses)).\
... filter(User.name=='jack').\
... filter(Address.email_address=='someaddress@foo.com').\
... all()
SELECT
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
JOIN addresses ON users.id = addresses.user_id
WHERE
users.name = ?
AND addresses.email_address = ?
['jack', 'someaddress@foo.com']
# ... subqueryload() emits a SELECT in order
# to load all address records ...
When using joined eager loading, if the query contains a modifier that impacts the rows returned externally to the joins, such as when using DISTINCT, LIMIT, OFFSET or equivalent, the completed statement is first wrapped inside a subquery, and the joins used specifically for joined eager loading are applied to the subquery. SQLAlchemy’s joined eager loading goes the extra mile, and then ten miles further, to absolutely ensure that it does not affect the end result of the query, only the way collections and related objects are loaded, no matter what the format of the query is.
Subquery Eager Loading¶
Subqueryload eager loading is configured in the same manner as that of
joined eager loading; for the relationship.lazy
parameter,
we would specify "subquery"
rather than "joined"
, and for
the option we use the subqueryload()
option rather than the
joinedload()
option.
The operation of subquery eager loading is to emit a second SELECT statement for each relationship to be loaded, across all result objects at once. This SELECT statement refers to the original SELECT statement, wrapped inside of a subquery, so that we retrieve the same list of primary keys for the primary object being returned, then link that to the sum of all the collection members to load them at once:
>>> jack = session.query(User).\
... options(subqueryload(User.addresses)).\
... filter_by(name='jack').all()
SELECT
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
WHERE users.name = ?
('jack',)
SELECT
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id,
anon_1.users_id AS anon_1_users_id
FROM (
SELECT users.id AS users_id
FROM users
WHERE users.name = ?) AS anon_1
JOIN addresses ON anon_1.users_id = addresses.user_id
ORDER BY anon_1.users_id, addresses.id
('jack',)
The subqueryload strategy has many advantages over joined eager loading in the area of loading collections. First, it allows the original query to proceed without changing it at all, not introducing in particular a LEFT OUTER JOIN that may make it less efficient. Secondly, it allows for many collections to be eagerly loaded without producing a single query that has many JOINs in it, which can be even less efficient; each relationship is loaded in a fully separate query. Finally, because the additional query only needs to load the collection items and not the lead object, it can use an inner JOIN in all cases for greater query efficiency.
Disadvantages of subqueryload include that the complexity of the original
query is transferred to the relationship queries, which when combined with the
use of a subquery, can on some backends in some cases (notably MySQL) produce
significantly slow queries. Additionally, the subqueryload strategy can only
load the full contents of all collections at once, is therefore incompatible
with “batched” loading supplied by Query.yield_per()
, both for collection
and scalar relationships.
The newer style of loading provided by selectinload()
solves these
limitations of subqueryload()
.
See also
The Importance of Ordering¶
A query which makes use of subqueryload()
in conjunction with a
limiting modifier such as Query.first()
, Query.limit()
,
or Query.offset()
should always include Query.order_by()
against unique column(s) such as the primary key, so that the additional queries
emitted by subqueryload()
include
the same ordering as used by the parent query. Without it, there is a chance
that the inner query could return the wrong rows:
# incorrect, no ORDER BY
session.query(User).options(
subqueryload(User.addresses)).first()
# incorrect if User.name is not unique
session.query(User).options(
subqueryload(User.addresses)
).order_by(User.name).first()
# correct
session.query(User).options(
subqueryload(User.addresses)
).order_by(User.name, User.id).first()
See also
Why is ORDER BY required with LIMIT (especially with subqueryload())? - detailed example
Select IN loading¶
Select IN loading is similar in operation to subquery eager loading, however the SELECT statement which is emitted has a much simpler structure than that of subquery eager loading. In most cases, selectin loading is the most simple and efficient way to eagerly load collections of objects. The only scenario in which selectin eager loading is not feasible is when the model is using composite primary keys, and the backend database does not support tuples with IN, which currently includes SQL Server.
New in version 1.2.
“Select IN” eager loading is provided using the "selectin"
argument to
relationship.lazy
or by using the selectinload()
loader
option. This style of loading emits a SELECT that refers to the primary key
values of the parent object, or in the case of a many-to-one
relationship to the those of the child objects, inside of an IN clause, in
order to load related associations:
>>> jack = session.query(User).\
... options(selectinload('addresses')).\
... filter(or_(User.name == 'jack', User.name == 'ed')).all()
SELECT
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
WHERE users.name = ? OR users.name = ?
('jack', 'ed')
SELECT
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id
FROM addresses
WHERE addresses.user_id IN (?, ?)
(5, 7)
Above, the second SELECT refers to addresses.user_id IN (5, 7)
, where the
“5” and “7” are the primary key values for the previous two User
objects loaded; after a batch of objects are completely loaded, their primary
key values are injected into the IN
clause for the second SELECT.
Because the relationship between User
and Address
has a simple [1]
primary join condition and provides that the
primary key values for User
can be derived from Address.user_id
, the
statement has no joins or subqueries at all.
Changed in version 1.3: selectin loading can omit the JOIN for a simple one-to-many collection.
For simple [1] many-to-one loads, a JOIN is also not needed as the foreign key value from the parent object is used:
>>> session.query(Address).\
... options(selectinload('user')).all()
SELECT
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id
FROM addresses
SELECT
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.nickname AS users_nickname
FROM users
WHERE users.id IN (?, ?)
(1, 2)
Changed in version 1.3.6: selectin loading can also omit the JOIN for a simple many-to-one relationship.
Select IN loading also supports many-to-many relationships, where it currently will JOIN across all three tables to match rows from one side to the other.
Things to know about this kind of loading include:
The SELECT statement emitted by the “selectin” loader strategy, unlike that of “subquery”, does not require a subquery nor does it inherit any of the performance limitations of the original query; the lookup is a simple primary key lookup and should have high performance.
The special ordering requirements of subqueryload described at The Importance of Ordering also don’t apply to selectin loading; selectin is always linking directly to a parent primary key and can’t really return the wrong result.
“selectin” loading, unlike joined or subquery eager loading, always emits its SELECT in terms of the immediate parent objects just loaded, and not the original type of object at the top of the chain. So if eager loading many levels deep, “selectin” loading still will not require any JOINs for simple one-to-many or many-to-one relationships. In comparison, joined and subquery eager loading always refer to multiple JOINs up to the original parent.
The strategy emits a SELECT for up to 500 parent primary key values at a time, as the primary keys are rendered into a large IN expression in the SQL statement. Some databases like Oracle have a hard limit on how large an IN expression can be, and overall the size of the SQL string shouldn’t be arbitrarily large.
As “selectin” loading relies upon IN, for a mapping with composite primary keys, it must use the “tuple” form of IN, which looks like
WHERE (table.column_a, table.column_b) IN ((?, ?), (?, ?), (?, ?))
. This syntax is not currently supported on SQL Server and for SQLite requires at least version 3.15. There is no special logic in SQLAlchemy to check ahead of time which platforms support this syntax or not; if run against a non-supporting platform, the database will return an error immediately. An advantage to SQLAlchemy just running the SQL out for it to fail is that if a particular database does start supporting this syntax, it will work without any changes to SQLAlchemy (as was the case with SQLite).
In general, “selectin” loading is probably superior to “subquery” eager loading in most ways, save for the syntax requirement with composite primary keys and possibly that it may emit many SELECT statements for larger result sets. As always, developers should spend time looking at the statements and results generated by their applications in development to check that things are working efficiently.
What Kind of Loading to Use ?¶
Which type of loading to use typically comes down to optimizing the tradeoff
between number of SQL executions, complexity of SQL emitted, and amount of
data fetched. Lets take two examples, a relationship()
which references a collection, and a relationship()
that
references a scalar many-to-one reference.
One to Many Collection
When using the default lazy loading, if you load 100 objects, and then access a collection on each of them, a total of 101 SQL statements will be emitted, although each statement will typically be a simple SELECT without any joins.
When using joined loading, the load of 100 objects and their collections will emit only one SQL statement. However, the total number of rows fetched will be equal to the sum of the size of all the collections, plus one extra row for each parent object that has an empty collection. Each row will also contain the full set of columns represented by the parents, repeated for each collection item - SQLAlchemy does not re-fetch these columns other than those of the primary key, however most DBAPIs (with some exceptions) will transmit the full data of each parent over the wire to the client connection in any case. Therefore joined eager loading only makes sense when the size of the collections are relatively small. The LEFT OUTER JOIN can also be performance intensive compared to an INNER join.
When using subquery loading, the load of 100 objects will emit two SQL statements. The second statement will fetch a total number of rows equal to the sum of the size of all collections. An INNER JOIN is used, and a minimum of parent columns are requested, only the primary keys. So a subquery load makes sense when the collections are larger.
When multiple levels of depth are used with joined or subquery loading, loading collections-within- collections will multiply the total number of rows fetched in a cartesian fashion. Both joined and subquery eager loading always join from the original parent class; if loading a collection four levels deep, there will be four JOINs out to the parent. selectin loading on the other hand will always have exactly one JOIN to the immediate parent table.
Using selectin loading, the load of 100 objects will also emit two SQL statements, the second of which refers to the 100 primary keys of the objects loaded. selectin loading will however render at most 500 primary key values into a single SELECT statement; so for a lead collection larger than 500, there will be a SELECT statement emitted for each batch of 500 objects selected.
Using multiple levels of depth with selectin loading does not incur the “cartesian” issue that joined and subquery eager loading have; the queries for selectin loading have the best performance characteristics and the fewest number of rows. The only caveat is that there might be more than one SELECT emitted depending on the size of the lead result.
selectin loading, unlike joined (when using collections) and subquery eager loading (all kinds of relationships), is potentially compatible with result set batching provided by
Query.yield_per()
assuming an appropriate database driver, so may be able to allow batching for large result sets.
Many to One Reference
When using the default lazy loading, a load of 100 objects will like in the case of the collection emit as many as 101 SQL statements. However - there is a significant exception to this, in that if the many-to-one reference is a simple foreign key reference to the target’s primary key, each reference will be checked first in the current identity map using
Query.get()
. So here, if the collection of objects references a relatively small set of target objects, or the full set of possible target objects have already been loaded into the session and are strongly referenced, using the default of lazy=’select’ is by far the most efficient way to go.When using joined loading, the load of 100 objects will emit only one SQL statement. The join will be a LEFT OUTER JOIN, and the total number of rows will be equal to 100 in all cases. If you know that each parent definitely has a child (i.e. the foreign key reference is NOT NULL), the joined load can be configured with
relationship.innerjoin
set toTrue
, which is usually specified within therelationship()
. For a load of objects where there are many possible target references which may have not been loaded already, joined loading with an INNER JOIN is extremely efficient.Subquery loading will issue a second load for all the child objects, so for a load of 100 objects there would be two SQL statements emitted. There’s probably not much advantage here over joined loading, however, except perhaps that subquery loading can use an INNER JOIN in all cases whereas joined loading requires that the foreign key is NOT NULL.
Selectin loading will also issue a second load for all the child objects (and as stated before, for larger results it will emit a SELECT per 500 rows), so for a load of 100 objects there would be two SQL statements emitted. The query itself still has to JOIN to the parent table, so again there’s not too much advantage to selectin loading for many-to-one vs. joined eager loading save for the use of INNER JOIN in all cases.
Polymorphic Eager Loading¶
Specification of polymorphic options on a per-eager-load basis is supported.
See the section Eager Loading of Specific or Polymorphic Subtypes for examples
of the PropComparator.of_type()
method in conjunction with the
with_polymorphic()
function.
Wildcard Loading Strategies¶
Each of joinedload()
, subqueryload()
, lazyload()
,
selectinload()
,
noload()
, and raiseload()
can be used to set the default
style of relationship()
loading
for a particular query, affecting all relationship()
-mapped
attributes not otherwise
specified in the Query
. This feature is available by passing
the string '*'
as the argument to any of these options:
session.query(MyClass).options(lazyload('*'))
Above, the lazyload('*')
option will supersede the lazy
setting
of all relationship()
constructs in use for that query,
except for those which use the 'dynamic'
style of loading.
If some relationships specify
lazy='joined'
or lazy='subquery'
, for example,
using lazyload('*')
will unilaterally
cause all those relationships to use 'select'
loading, e.g. emit a
SELECT statement when each attribute is accessed.
The option does not supersede loader options stated in the
query, such as eagerload()
,
subqueryload()
, etc. The query below will still use joined loading
for the widget
relationship:
session.query(MyClass).options(
lazyload('*'),
joinedload(MyClass.widget)
)
If multiple '*'
options are passed, the last one overrides
those previously passed.
Per-Entity Wildcard Loading Strategies¶
A variant of the wildcard loader strategy is the ability to set the strategy
on a per-entity basis. For example, if querying for User
and Address
,
we can instruct all relationships on Address
only to use lazy loading
by first applying the Load
object, then specifying the *
as a
chained option:
session.query(User, Address).options(
Load(Address).lazyload('*'))
Above, all relationships on Address
will be set to a lazy load.
Routing Explicit Joins/Statements into Eagerly Loaded Collections¶
The behavior of joinedload()
is such that joins are
created automatically, using anonymous aliases as targets, the results of which
are routed into collections and
scalar references on loaded objects. It is often the case that a query already
includes the necessary joins which represent a particular collection or scalar
reference, and the joins added by the joinedload feature are redundant - yet
you’d still like the collections/references to be populated.
For this SQLAlchemy supplies the contains_eager()
option. This option is used in the same manner as the
joinedload()
option except it is assumed that the
Query
will specify the appropriate joins
explicitly. Below, we specify a join between User
and Address
and additionally establish this as the basis for eager loading of User.addresses
:
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
addresses = relationship("Address")
class Address(Base):
__tablename__ = 'address'
# ...
q = session.query(User).join(User.addresses).\
options(contains_eager(User.addresses))
If the “eager” portion of the statement is “aliased”, the path
should be specified using PropComparator.of_type()
, which allows
the specific aliased()
construct to be passed:
# use an alias of the Address entity
adalias = aliased(Address)
# construct a Query object which expects the "addresses" results
query = session.query(User).\
outerjoin(User.addresses.of_type(adalias)).\
options(contains_eager(User.addresses.of_type(adalias)))
# get results normally
r = query.all()
SELECT
users.user_id AS users_user_id,
users.user_name AS users_user_name,
adalias.address_id AS adalias_address_id,
adalias.user_id AS adalias_user_id,
adalias.email_address AS adalias_email_address,
(...other columns...)
FROM users
LEFT OUTER JOIN email_addresses AS email_addresses_1
ON users.user_id = email_addresses_1.user_id
The path given as the argument to contains_eager()
needs
to be a full path from the starting entity. For example if we were loading
Users->orders->Order->items->Item
, the option would be used as:
query(User).options(
contains_eager(User.orders).
contains_eager(Order.items))
Using contains_eager() to load a custom-filtered collection result¶
When we use contains_eager()
, we are constructing ourselves the
SQL that will be used to populate collections. From this, it naturally follows
that we can opt to modify what values the collection is intended to store,
by writing our SQL to load a subset of elements for collections or
scalar attributes.
As an example, we can load a User
object and eagerly load only particular
addresses into its .addresses
collection by filtering the joined data,
routing it using contains_eager()
, also using
Query.populate_existing()
to ensure any already-loaded collections
are overwritten:
q = session.query(User).\
join(User.addresses).\
filter(Address.email_address.like('%@aol.com')).\
options(contains_eager(User.addresses)).\
populate_existing()
The above query will load only User
objects which contain at
least Address
object that contains the substring 'aol.com'
in its
email
field; the User.addresses
collection will contain only
these Address
entries, and not any other Address
entries that are
in fact associated with the collection.
Tip
In all cases, the SQLAlchemy ORM does not overwrite already loaded
attributes and collections unless told to do so. As there is an
identity map in use, it is often the case that an ORM query is
returning objects that were in fact already present and loaded in memory.
Therefore, when using contains_eager()
to populate a collection
in an alternate way, it is usually a good idea to use
Query.populate_existing()
as illustrated above so that an
already-loaded collection is refreshed with the new data.
Query.populate_existing()
will reset all attributes that were
already present, including pending changes, so make sure all data is flushed
before using it. Using the Session
with its default behavior
of autoflush is sufficient.
Note
The customized collection we load using contains_eager()
is not “sticky”; that is, the next time this collection is loaded, it will
be loaded with its usual default contents. The collection is subject
to being reloaded if the object is expired, which occurs whenever the
Session.commit()
, Session.rollback()
methods are used
assuming default session settings, or the Session.expire_all()
or Session.expire()
methods are used.
Creating Custom Load Rules¶
Warning
This is an advanced technique! Great care and testing should be applied.
The ORM has various edge cases where the value of an attribute is locally
available, however the ORM itself doesn’t have awareness of this. There
are also cases when a user-defined system of loading attributes is desirable.
To support the use case of user-defined loading systems, a key function
set_committed_value()
is provided. This function is
basically equivalent to Python’s own setattr()
function, except that
when applied to a target object, SQLAlchemy’s “attribute history” system
which is used to determine flush-time changes is bypassed; the attribute
is assigned in the same way as if the ORM loaded it that way from the database.
The use of set_committed_value()
can be combined with another
key event known as InstanceEvents.load()
to produce attribute-population
behaviors when an object is loaded. One such example is the bi-directional
“one-to-one” case, where loading the “many-to-one” side of a one-to-one
should also imply the value of the “one-to-many” side. The SQLAlchemy ORM
does not consider backrefs when loading related objects, and it views a
“one-to-one” as just another “one-to-many”, that just happens to be one
row.
Given the following mapping:
from sqlalchemy import Integer, ForeignKey, Column
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
b_id = Column(ForeignKey('b.id'))
b = relationship(
"B",
backref=backref("a", uselist=False),
lazy='joined')
class B(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
If we query for an A
row, and then ask it for a.b.a
, we will get
an extra SELECT:
>>> a1.b.a
SELECT a.id AS a_id, a.b_id AS a_b_id
FROM a
WHERE ? = a.b_id
This SELECT is redundant because b.a
is the same value as a1
. We
can create an on-load rule to populate this for us:
from sqlalchemy import event
from sqlalchemy.orm import attributes
@event.listens_for(A, "load")
def load_b(target, context):
if 'b' in target.__dict__:
attributes.set_committed_value(target.b, 'a', target)
Now when we query for A
, we will get A.b
from the joined eager load,
and A.b.a
from our event:
a1 = s.query(A).first()
SELECT
a.id AS a_id,
a.b_id AS a_b_id,
b_1.id AS b_1_id
FROM a
LEFT OUTER JOIN b AS b_1 ON b_1.id = a.b_id
LIMIT ? OFFSET ?
(1, 0)
assert a1.b.a is a1
Relationship Loader API¶
Object Name | Description |
---|---|
contains_alias(alias) |
Return a |
contains_eager(*keys, **kw) |
Indicate that the given attribute should be eagerly loaded from columns stated manually in the query. |
defaultload(*keys) |
Indicate an attribute should load using its default loader style. |
eagerload(*args, **kwargs) |
A synonym for |
eagerload_all(*args, **kwargs) |
A synonym for |
immediateload(*keys) |
Indicate that the given attribute should be loaded using an immediate load with a per-attribute SELECT statement. |
joinedload(*keys, **kw) |
Indicate that the given attribute should be loaded using joined eager loading. |
joinedload_all(*keys, **kw) |
Produce a standalone “all” option for
|
lazyload(*keys) |
Indicate that the given attribute should be loaded using “lazy” loading. |
Represents loader options which modify the state of a
|
|
noload(*keys) |
Indicate that the given relationship attribute should remain unloaded. |
raiseload(*keys, **kw) |
Indicate that the given relationship attribute should disallow lazy loads. |
selectinload(*keys) |
Indicate that the given attribute should be loaded using SELECT IN eager loading. |
selectinload_all(*keys) |
Produce a standalone “all” option for
|
subqueryload(*keys) |
Indicate that the given attribute should be loaded using subquery eager loading. |
subqueryload_all(*keys) |
Produce a standalone “all” option for
|
- function sqlalchemy.orm.contains_alias(alias)¶
Return a
MapperOption
that will indicate to theQuery
that the main table has been aliased.This is a seldom-used option to suit the very rare case that
contains_eager()
is being used in conjunction with a user-defined SELECT statement that aliases the parent table. E.g.:# define an aliased UNION called 'ulist' ulist = users.select(users.c.user_id==7).\ union(users.select(users.c.user_id>7)).\ alias('ulist') # add on an eager load of "addresses" statement = ulist.outerjoin(addresses).\ select().apply_labels() # create query, indicating "ulist" will be an # alias for the main table, "addresses" # property should be eager loaded query = session.query(User).options( contains_alias(ulist), contains_eager(User.addresses)) # then get results via the statement results = query.from_statement(statement).all()
- Parameters:
alias – is the string name of an alias, or a
Alias
object representing the alias.
- function sqlalchemy.orm.contains_eager(*keys, **kw)¶
Indicate that the given attribute should be eagerly loaded from columns stated manually in the query.
This function is part of the
Load
interface and supports both method-chained and standalone operation.The option is used in conjunction with an explicit join that loads the desired rows, i.e.:
sess.query(Order).\ join(Order.user).\ options(contains_eager(Order.user))
The above query would join from the
Order
entity to its relatedUser
entity, and the returnedOrder
objects would have theOrder.user
attribute pre-populated.It may also be used for customizing the entries in an eagerly loaded collection; queries will normally want to use the
Query.populate_existing()
method assuming the primary collection of parent objects may already have been loaded:sess.query(User).\ join(User.addresses).\ filter(Address.email_address.like('%@aol.com')).\ options(contains_eager(User.addresses)).\ populate_existing()
See the section Routing Explicit Joins/Statements into Eagerly Loaded Collections for complete usage details.
- function sqlalchemy.orm.defaultload(*keys)¶
Indicate an attribute should load using its default loader style.
This method is used to link to other loader options further into a chain of attributes without altering the loader style of the links along the chain. For example, to set joined eager loading for an element of an element:
session.query(MyClass).options( defaultload(MyClass.someattribute). joinedload(MyOtherClass.someotherattribute) )
defaultload()
is also useful for setting column-level options on a related class, namely that ofdefer()
andundefer()
:session.query(MyClass).options( defaultload(MyClass.someattribute). defer("some_column"). undefer("some_other_column") )
See also
Load.options()
- allows for complex hierarchical loader option structures with less verbosity than with individualdefaultload()
directives.
- function sqlalchemy.orm.eagerload(*args, **kwargs)¶
A synonym for
joinedload()
.
- function sqlalchemy.orm.eagerload_all(*args, **kwargs)¶
A synonym for
joinedload_all()
- function sqlalchemy.orm.immediateload(*keys)¶
Indicate that the given attribute should be loaded using an immediate load with a per-attribute SELECT statement.
The
immediateload()
option is superseded in general by theselectinload()
option, which performs the same task more efficiently by emitting a SELECT for all loaded objects.This function is part of the
Load
interface and supports both method-chained and standalone operation.
- function sqlalchemy.orm.joinedload(*keys, **kw)¶
Indicate that the given attribute should be loaded using joined eager loading.
This function is part of the
Load
interface and supports both method-chained and standalone operation.examples:
# joined-load the "orders" collection on "User" query(User).options(joinedload(User.orders)) # joined-load Order.items and then Item.keywords query(Order).options( joinedload(Order.items).joinedload(Item.keywords)) # lazily load Order.items, but when Items are loaded, # joined-load the keywords collection query(Order).options( lazyload(Order.items).joinedload(Item.keywords))
- Parameters:
innerjoin –
if
True
, indicates that the joined eager load should use an inner join instead of the default of left outer join:query(Order).options(joinedload(Order.user, innerjoin=True))
In order to chain multiple eager joins together where some may be OUTER and others INNER, right-nested joins are used to link them:
query(A).options( joinedload(A.bs, innerjoin=False). joinedload(B.cs, innerjoin=True) )
The above query, linking A.bs via “outer” join and B.cs via “inner” join would render the joins as “a LEFT OUTER JOIN (b JOIN c)”. When using older versions of SQLite (< 3.7.16), this form of JOIN is translated to use full subqueries as this syntax is otherwise not directly supported.
The
innerjoin
flag can also be stated with the term"unnested"
. This indicates that an INNER JOIN should be used, unless the join is linked to a LEFT OUTER JOIN to the left, in which case it will render as LEFT OUTER JOIN. For example, supposingA.bs
is an outerjoin:query(A).options( joinedload(A.bs). joinedload(B.cs, innerjoin="unnested") )
The above join will render as “a LEFT OUTER JOIN b LEFT OUTER JOIN c”, rather than as “a LEFT OUTER JOIN (b JOIN c)”.
Note
The “unnested” flag does not affect the JOIN rendered from a many-to-many association table, e.g. a table configured as
relationship.secondary
, to the target table; for correctness of results, these joins are always INNER and are therefore right-nested if linked to an OUTER join.Changed in version 1.0.0:
innerjoin=True
now impliesinnerjoin="nested"
, whereas in 0.9 it impliedinnerjoin="unnested"
. In order to achieve the pre-1.0 “unnested” inner join behavior, use the valueinnerjoin="unnested"
. See Right inner join nesting now the default for joinedload with innerjoin=True.
Note
The joins produced by
joinedload()
are anonymously aliased. The criteria by which the join proceeds cannot be modified, nor can theQuery
refer to these joins in any way, including ordering. See The Zen of Joined Eager Loading for further detail.To produce a specific SQL JOIN which is explicitly available, use
Query.join()
. To combine explicit JOINs with eager loading of collections, usecontains_eager()
; see Routing Explicit Joins/Statements into Eagerly Loaded Collections.
- function sqlalchemy.orm.joinedload_all(*keys, **kw)¶
Produce a standalone “all” option for
joinedload()
.Deprecated since version 0.9: The
joinedload_all()
function is deprecated, and will be removed in a future release. Please use method chaining withjoinedload()
instead, as in:session.query(MyClass).options( joinedload("someattribute").joinedload("anotherattribute") )
- function sqlalchemy.orm.lazyload(*keys)¶
Indicate that the given attribute should be loaded using “lazy” loading.
This function is part of the
Load
interface and supports both method-chained and standalone operation.
- class sqlalchemy.orm.Load(entity)¶
Represents loader options which modify the state of a
Query
in order to affect how various mapped attributes are loaded.The
Load
object is in most cases used implicitly behind the scenes when one makes use of a query option likejoinedload()
,defer()
, or similar. However, theLoad
object can also be used directly, and in some cases can be useful.To use
Load
directly, instantiate it with the target mapped class as the argument. This style of usage is useful when dealing with aQuery
that has multiple entities:myopt = Load(MyClass).joinedload("widgets")
The above
myopt
can now be used withQuery.options()
, where it will only take effect for theMyClass
entity:session.query(MyClass, MyOtherClass).options(myopt)
One case where
Load
is useful as public API is when specifying “wildcard” options that only take effect for a certain class:session.query(Order).options(Load(Order).lazyload('*'))
Above, all relationships on
Order
will be lazy-loaded, but other attributes on those descendant objects will load using their normal loader strategy.Class signature
class
sqlalchemy.orm.Load
(sqlalchemy.sql.expression.Generative
,sqlalchemy.orm.MapperOption
)
- function sqlalchemy.orm.noload(*keys)¶
Indicate that the given relationship attribute should remain unloaded.
This function is part of the
Load
interface and supports both method-chained and standalone operation.noload()
applies torelationship()
attributes; for column-based attributes, seedefer()
.See also
- function sqlalchemy.orm.raiseload(*keys, **kw)¶
Indicate that the given relationship attribute should disallow lazy loads.
A relationship attribute configured with
raiseload()
will raise anInvalidRequestError
upon access. The typical way this is useful is when an application is attempting to ensure that all relationship attributes that are accessed in a particular context would have been already loaded via eager loading. Instead of having to read through SQL logs to ensure lazy loads aren’t occurring, this strategy will cause them to raise immediately.- Parameters:
sql_only – if True, raise only if the lazy load would emit SQL, but not if it is only checking the identity map, or determining that the related value should just be None due to missing keys. When False, the strategy will raise for all varieties of lazyload.
This function is part of the
Load
interface and supports both method-chained and standalone operation.raiseload()
applies torelationship()
attributes only.New in version 1.1.
- function sqlalchemy.orm.selectinload(*keys)¶
Indicate that the given attribute should be loaded using SELECT IN eager loading.
This function is part of the
Load
interface and supports both method-chained and standalone operation.examples:
# selectin-load the "orders" collection on "User" query(User).options(selectinload(User.orders)) # selectin-load Order.items and then Item.keywords query(Order).options( selectinload(Order.items).selectinload(Item.keywords)) # lazily load Order.items, but when Items are loaded, # selectin-load the keywords collection query(Order).options( lazyload(Order.items).selectinload(Item.keywords))
New in version 1.2.
- function sqlalchemy.orm.selectinload_all(*keys)¶
Produce a standalone “all” option for
selectinload()
.Deprecated since version 0.9: The
selectinload_all()
function is deprecated, and will be removed in a future release. Please use method chaining withselectinload()
instead, as in:session.query(MyClass).options( selectinload("someattribute").selectinload("anotherattribute") )
- function sqlalchemy.orm.subqueryload(*keys)¶
Indicate that the given attribute should be loaded using subquery eager loading.
This function is part of the
Load
interface and supports both method-chained and standalone operation.examples:
# subquery-load the "orders" collection on "User" query(User).options(subqueryload(User.orders)) # subquery-load Order.items and then Item.keywords query(Order).options( subqueryload(Order.items).subqueryload(Item.keywords)) # lazily load Order.items, but when Items are loaded, # subquery-load the keywords collection query(Order).options( lazyload(Order.items).subqueryload(Item.keywords))
- function sqlalchemy.orm.subqueryload_all(*keys)¶
Produce a standalone “all” option for
subqueryload()
.Deprecated since version 0.9: The
subqueryload_all()
function is deprecated, and will be removed in a future release. Please use method chaining withsubqueryload()
instead, as in:session.query(MyClass).options( subqueryload("someattribute").subqueryload("anotherattribute") )