What’s New in SQLAlchemy 2.0?¶
Note for Readers
SQLAlchemy 2.0’s transition documents are separated into two documents - one which details major API shifts from the 1.x to 2.x series, and the other which details new features and behaviors relative to SQLAlchemy 1.4:
SQLAlchemy 2.0 - Major Migration Guide - 1.x to 2.x API shifts
What’s New in SQLAlchemy 2.0? - this document, new features and behaviors for SQLAlchemy 2.0
Readers who have not yet updated their 1.4 application to follow SQLAlchemy 2.0 engine and ORM conventions may navigate to SQLAlchemy 2.0 - Major Migration Guide for a guide to ensuring SQLAlchemy 2.0 compatibility, which is a prerequisite for having working code under version 2.0.
About this Document
This document describes changes between SQLAlchemy version 1.4 and SQLAlchemy version 2.0, independent of the major changes between 1.x style and 2.0 style usage. Readers should start with the SQLAlchemy 2.0 - Major Migration Guide document to get an overall picture of the major compatibility changes between the 1.x and 2.x series.
Aside from the major 1.x->2.x migration path, the next largest paradigm shift in SQLAlchemy 2.0 is deep integration with PEP 484 typing practices and current capabilities, particularly within the ORM. New type-driven ORM declarative styles inspired by Python dataclasses, as well as new integrations with dataclasses themselves, complement an overall approach that no longer requires stubs and also goes very far towards providing a type-aware method chain from SQL statement to result set.
The prominence of Python typing is significant not only so that type checkers like mypy can run without plugins; more significantly it allows IDEs like vscode and pycharm to take a much more active role in assisting with the composition of a SQLAlchemy application.
New Typing Support in Core and ORM - Stubs / Extensions no longer used¶
The approach to typing for Core and ORM has been completely reworked, compared
to the interim approach that was provided in version 1.4 via the
sqlalchemy2-stubs package. The new approach begins at the most fundamental
element in SQLAlchemy which is the Column
, or more
accurately the ColumnElement
that underlies all SQL
expressions that have a type. This expression-level typing then extends into the area of
statement construction, statement execution, and result sets, and finally into the ORM
where new declarative forms allow
for fully typed ORM models that integrate all the way from statement to
result set.
Tip
Typing support should be considered beta level software for the 2.0 series. Typing details are subject to change however significant backwards-incompatible changes are not planned.
SQL Expression / Statement / Result Set Typing¶
This section provides background and examples for SQLAlchemy’s new
SQL expression typing approach, which extends from base ColumnElement
constructs through SQL statements and result sets and into realm of ORM mapping.
Rationale and Overview¶
Tip
This section is an architectural discussion. Skip ahead to SQL Expression Typing - Examples to just see what the new typing looks like.
In sqlalchemy2-stubs, SQL expressions were typed as generics that then
referred to a TypeEngine
object such as Integer
,
DateTime
, or String
as their generic argument
(such as Column[Integer]
). This was itself a departure from what
the original Dropbox sqlalchemy-stubs package did, where
Column
and its foundational constructs were directly generic on
Python types, such as int
, datetime
and str
. It was hoped
that since Integer
/ DateTime
/ String
themselves
are generic against int
/ datetime
/ str
, there would be ways
to maintain both levels of information and to be able to extract the Python
type from a column expression via the TypeEngine
as an intermediary
construct. However, this is not the case, as PEP 484
doesn’t really have a rich enough feature set for this to be viable,
lacking capabilities such as
higher kinded TypeVars.
So after a deep assessment
of the current capabilities of PEP 484, SQLAlchemy 2.0 has realized the
original wisdom of sqlalchemy-stubs in this area and returned to linking
column expressions directly to Python types. This does mean that if one
has SQL expressions to different subtypes, like Column(VARCHAR)
vs.
Column(Unicode)
, the specifics of those two String
subtypes
is not carried along as the type only carries along str
,
but in practice this is usually not an issue and it is generally vastly more
useful that the Python type is immediately present, as it represents the
in-Python data one will be storing and receiving for this column directly.
Concretely, this means that an expression like Column('id', Integer)
is typed as Column[int]
. This allows for a viable pipeline of
SQLAlchemy construct -> Python datatype to be set up, without the need for
typing plugins. Crucially, it allows full interoperability with
the ORM’s paradigm of using select()
and Row
constructs that reference ORM mapped class types (e.g. a Row
containing instances of user-mapped instances, such as the User
and
Address
examples used in our tutorials). While Python typing currently has very limited
support for customization of tuple-types (where PEP 646, the first pep that
attempts to deal with tuple-like objects, was intentionally limited
in its functionality
and by itself is not yet viable for arbitrary tuple
manipulation),
a fairly decent approach has been devised that allows for basic
select()
-> Result
-> Row
typing
to function, including for ORM classes, where at the point at which a
Row
object is to be unpacked into individual column entries,
a small typing-oriented accessor is added that allows the individual Python
values to maintain the Python type linked to the SQL expression from which
they originated (translation: it works).
SQL Expression Typing - Examples¶
A brief tour of typing behaviors. Comments indicate what one would see hovering over the code in vscode (or roughly what typing tools would display when using the reveal_type() helper):
Simple Python Types Assigned to SQL Expressions
# (variable) str_col: ColumnClause[str] str_col = column("a", String) # (variable) int_col: ColumnClause[int] int_col = column("a", Integer) # (variable) expr1: ColumnElement[str] expr1 = str_col + "x" # (variable) expr2: ColumnElement[int] expr2 = int_col + 10 # (variable) expr3: ColumnElement[bool] expr3 = int_col == 15
Individual SQL expressions assigned to
select()
constructs, as well as any row-returning construct, including row-returning DML such asInsert
withInsert.returning()
, are packed into aTuple[]
type which retains the Python type for each element.# (variable) stmt: Select[Tuple[str, int]] stmt = select(str_col, int_col) # (variable) stmt: ReturningInsert[Tuple[str, int]] ins_stmt = insert(table("t")).returning(str_col, int_col)
The
Tuple[]
type from any row returning construct, when invoked with an.execute()
method, carries through toResult
andRow
. In order to unpack theRow
object as a tuple, theRow.tuple()
orRow.t
accessor essentially casts theRow
into the correspondingTuple[]
(though remains the sameRow
object at runtime).with engine.connect() as conn: # (variable) stmt: Select[Tuple[str, int]] stmt = select(str_col, int_col) # (variable) result: Result[Tuple[str, int]] result = conn.execute(stmt) # (variable) row: Row[Tuple[str, int]] | None row = result.first() if row is not None: # for typed tuple unpacking or indexed access, # use row.tuple() or row.t (this is the small typing-oriented accessor) strval, intval = row.t # (variable) strval: str strval # (variable) intval: int intval
Scalar values for single-column statements do the right thing with methods like
Connection.scalar()
,Result.scalars()
, etc.# (variable) data: Sequence[str] data = connection.execute(select(str_col)).scalars().all()
The above support for row-returning constructs works the best with ORM mapped classes, as a mapped class can list out specific types for its members. The example below sets up a class using new type-aware syntaxes, described in the following section:
from sqlalchemy.orm import DeclarativeBase from sqlalchemy.orm import Mapped from sqlalchemy.orm import mapped_column class Base(DeclarativeBase): pass class User(Base): __tablename__ = "user_account" id: Mapped[int] = mapped_column(primary_key=True) name: Mapped[str] addresses: Mapped[List["Address"]] = relationship() class Address(Base): __tablename__ = "address" id: Mapped[int] = mapped_column(primary_key=True) email_address: Mapped[str] user_id = mapped_column(ForeignKey("user_account.id"))
With the above mapping, the attributes are typed and express themselves all the way from statement to result set:
with Session(engine) as session: # (variable) stmt: Select[Tuple[int, str]] stmt_1 = select(User.id, User.name) # (variable) result_1: Result[Tuple[int, str]] result_1 = session.execute(stmt_1) # (variable) intval: int # (variable) strval: str intval, strval = result_1.one().t
Mapped classes themselves are also types, and behave the same way, such as a SELECT against two mapped classes:
with Session(engine) as session: # (variable) stmt: Select[Tuple[User, Address]] stmt_2 = select(User, Address).join_from(User, Address) # (variable) result_2: Result[Tuple[User, Address]] result_2 = session.execute(stmt_2) # (variable) user_obj: User # (variable) address_obj: Address user_obj, address_obj = result_2.one().t
When selecting mapped classes, constructs like
aliased
work as well, maintaining the column-level attributes of the original mapped class as well as the return type expected from a statement:with Session(engine) as session: # this is in fact an Annotated type, but typing tools don't # generally display this # (variable) u1: Type[User] u1 = aliased(User) # (variable) stmt: Select[Tuple[User, User, str]] stmt = select(User, u1, User.name).filter(User.id == 5) # (variable) result: Result[Tuple[User, User, str]] result = session.execute(stmt)
Core Table does not yet have a decent way to maintain typing of
Column
objects when accessing them via theTable.c
accessor.Since
Table
is set up as an instance of a class, and theTable.c
accessor typically accessesColumn
objects dynamically by name, there’s not yet an established typing approach for this; some alternative syntax would be needed.ORM classes, scalars, etc. work great.
The typical use case of selecting ORM classes, as scalars or tuples, all works, both 2.0 and 1.x style queries, getting back the exact type either by itself or contained within the appropriate container such as
Sequence[]
,List[]
orIterator[]
:# (variable) users1: Sequence[User] users1 = session.scalars(select(User)).all() # (variable) user: User user = session.query(User).one() # (variable) user_iter: Iterator[User] user_iter = iter(session.scalars(select(User)))
Legacy
Query
gains tuple typing as well.The typing support for
Query
goes well beyond what sqlalchemy-stubs or sqlalchemy2-stubs offered, where both scalar-object as well as tuple-typedQuery
objects will retain result level typing for most cases:# (variable) q1: RowReturningQuery[Tuple[int, str]] q1 = session.query(User.id, User.name) # (variable) rows: List[Row[Tuple[int, str]]] rows = q1.all() # (variable) q2: Query[User] q2 = session.query(User) # (variable) users: List[User] users = q2.all()
the catch - all stubs must be uninstalled¶
A key caveat with the typing support is that all SQLAlchemy stubs packages must be uninstalled for typing to work. When running mypy against a Python virtualenv, this is only a matter of uninstalling those packages. However, a SQLAlchemy stubs package is also currently part of typeshed, which itself is bundled into some typing tools such as Pylance, so it may be necessary in some cases to locate the files for these packages and delete them, if they are in fact interfering with the new typing working correctly.
Once SQLAlchemy 2.0 is released in final status, typeshed will remove SQLAlchemy from its own stubs source.
ORM Declarative Models¶
SQLAlchemy 1.4 introduced the first SQLAlchemy-native ORM typing support using a combination of sqlalchemy2-stubs and the Mypy Plugin. In SQLAlchemy 2.0, the Mypy plugin remains available, and has been updated to work with SQLAlchemy 2.0’s typing system. However, it should now be considered deprecated, as applications now have a straightforward path to adopting the new typing support that does not use plugins or stubs.
Overview¶
The fundamental approach for the new system is that mapped column declarations,
when using a fully Declarative model (that is,
not hybrid declarative or
imperative configurations, which are unchanged),
are first derived at runtime by inspecting the type annotation on the left side
of each attribute declaration, if present. Left hand type annotations are
expected to be contained within the
Mapped
generic type, otherwise the attribute is not considered
to be a mapped attribute. The attribute declaration may then refer to
the mapped_column()
construct on the right hand side, which is used
to provide additional Core-level schema information about the
Column
to be produced and mapped. This right hand side
declaration is optional if a Mapped
annotation is present on the
left side; if no annotation is present on the left side, then the
mapped_column()
may be used as an exact replacement for the
Column
directive where it will provide for more accurate (but
not exact) typing behavior of the attribute, even though no annotation is
present.
The approach is inspired by the approach of Python dataclasses which starts
with an annotation on the left, then allows for an optional
dataclasses.field()
specification on the right; the key difference from the
dataclasses approach is that SQLAlchemy’s approach is strictly opt-in,
where existing mappings that use Column
without any type
annotations continue to work as they always have, and the
mapped_column()
construct may be used as a direct replacement for
Column
without any explicit type annotations. Only for exact
attribute-level Python types to be present is the use of explicit annotations
with Mapped
required. These annotations may be used on an
as-needed, per-attribute basis for those attributes where specific types are
helpful; non-annotated attributes that use mapped_column()
will be
typed as Any
at the instance level.
Migrating an Existing Mapping¶
Transitioning to the new ORM approach begins as more verbose, but becomes more succinct than was previously possible as the available new features are used fully. The following steps detail a typical transition and then continue on to illustrate some more options.
Step one - declarative_base()
is superseded by DeclarativeBase
.¶
One observed limitation in Python typing is that there seems to be
no ability to have a class dynamically generated from a function which then
is understood by typing tools as a base for new classes. To solve this problem
without plugins, the usual call to declarative_base()
can be replaced
with using the DeclarativeBase
class, which produces the same
Base
object as usual, except that typing tools understand it:
from sqlalchemy.orm import DeclarativeBase
class Base(DeclarativeBase):
pass
Step two - replace Declarative use of Column
with mapped_column()
¶
The mapped_column()
is an ORM-typing aware construct that can
be swapped directly for the use of Column
. Given a
1.x style mapping as:
from sqlalchemy import Column
from sqlalchemy.orm import relationship
from sqlalchemy.orm import DeclarativeBase
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "user_account"
id = Column(Integer, primary_key=True)
name = Column(String(30), nullable=False)
fullname = Column(String)
addresses = relationship("Address", back_populates="user")
class Address(Base):
__tablename__ = "address"
id = Column(Integer, primary_key=True)
email_address = Column(String, nullable=False)
user_id = Column(ForeignKey("user_account.id"), nullable=False)
user = relationship("User", back_populates="addresses")
We replace Column
with mapped_column()
; no
arguments need to change:
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "user_account"
id = mapped_column(Integer, primary_key=True)
name = mapped_column(String(30), nullable=False)
fullname = mapped_column(String)
addresses = relationship("Address", back_populates="user")
class Address(Base):
__tablename__ = "address"
id = mapped_column(Integer, primary_key=True)
email_address = mapped_column(String, nullable=False)
user_id = mapped_column(ForeignKey("user_account.id"), nullable=False)
user = relationship("User", back_populates="addresses")
The individual columns above are not yet typed with Python types,
and are instead typed as Mapped[Any]
; this is because we can declare any
column either with Optional
or not, and there’s no way to have a
“guess” in place that won’t cause typing errors when we type it
explicitly.
However, at this step, our above mapping has appropriate descriptor types set up for all attributes and may be used in queries as well as for instance-level manipulation, all of which will pass mypy –strict mode with no plugins.
Step three - apply exact Python types as needed using Mapped
.¶
This can be done for all attributes for which exact typing is desired;
attributes that are fine being left as Any
may be skipped. For
context we also illustrate Mapped
being used for a
relationship()
where we apply an exact type.
The mapping within this interim step
will be more verbose, however with proficiency, this step can
be combined with subsequent steps to update mappings more directly:
from typing import List
from typing import Optional
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "user_account"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
name: Mapped[str] = mapped_column(String(30), nullable=False)
fullname: Mapped[Optional[str]] = mapped_column(String)
addresses: Mapped[List["Address"]] = relationship("Address", back_populates="user")
class Address(Base):
__tablename__ = "address"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
email_address: Mapped[str] = mapped_column(String, nullable=False)
user_id: Mapped[int] = mapped_column(ForeignKey("user_account.id"), nullable=False)
user: Mapped["User"] = relationship("User", back_populates="addresses")
At this point, our ORM mapping is fully typed and will produce exact-typed
select()
, Query
and Result
constructs. We now can proceed to pare down redundancy in the mapping
declaration.
Step four - remove mapped_column()
directives where no longer needed¶
All nullable
parameters can be implied using Optional[]
; in
the absence of Optional[]
, nullable
defaults to False
. All SQL
types without arguments such as Integer
and String
can be expressed
as a Python annotation alone. A mapped_column()
directive with no
parameters can be removed entirely. relationship()
now derives its
class from the left hand annotation, supporting forward references as well
(as relationship()
has supported string-based forward references
for ten years already ;) ):
from typing import List
from typing import Optional
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "user_account"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
fullname: Mapped[Optional[str]]
addresses: Mapped[List["Address"]] = relationship(back_populates="user")
class Address(Base):
__tablename__ = "address"
id: Mapped[int] = mapped_column(primary_key=True)
email_address: Mapped[str]
user_id: Mapped[int] = mapped_column(ForeignKey("user_account.id"))
user: Mapped["User"] = relationship(back_populates="addresses")
Step five - make use of pep-593 Annotated
to package common directives into types¶
This is a radical new
capability that presents an alternative, or complementary approach, to
declarative mixins as a means to provide type
oriented configuration, and also replaces the need for
declared_attr
decorated functions in most cases.
First, the Declarative mapping allows the mapping of Python type to
SQL type, such as str
to String
, to be customized
using registry.type_annotation_map
. Using PEP 593
Annotated
allows us to create variants of a particular Python type so that
the same type, such as str
, may be used which each provide variants
of String
, as below where use of an Annotated
str
called
str50
will indicate String(50)
:
from typing_extensions import Annotated
from sqlalchemy.orm import DeclarativeBase
str50 = Annotated[str, 50]
# declarative base with a type-level override, using a type that is
# expected to be used in multiple places
class Base(DeclarativeBase):
type_annotation_map = {
str50: String(50),
}
Second, Declarative will extract full
mapped_column()
definitions from the left hand type if
Annotated[]
is used, by passing a mapped_column()
construct
as any argument to the Annotated[]
construct (credit to @adriangb01
for illustrating this idea). This capability may be extended in future releases
to also include relationship()
, composite()
and other
constructs, but currently is limited to mapped_column()
. The
example below adds additional Annotated
types in addition to our
str50
example to illustrate this feature:
from typing_extensions import Annotated
from typing import List
from typing import Optional
from sqlalchemy import ForeignKey
from sqlalchemy import String
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
# declarative base from previous example
str50 = Annotated[str, 50]
class Base(DeclarativeBase):
type_annotation_map = {
str50: String(50),
}
# set up mapped_column() overrides, using whole column styles that are
# expected to be used in multiple places
intpk = Annotated[int, mapped_column(primary_key=True)]
user_fk = Annotated[int, mapped_column(ForeignKey("user_account.id"))]
class User(Base):
__tablename__ = "user_account"
id: Mapped[intpk]
name: Mapped[str50]
fullname: Mapped[Optional[str]]
addresses: Mapped[List["Address"]] = relationship(back_populates="user")
class Address(Base):
__tablename__ = "address"
id: Mapped[intpk]
email_address: Mapped[str50]
user_id: Mapped[user_fk]
user: Mapped["User"] = relationship(back_populates="addresses")
Above, columns that are mapped with Mapped[str50]
, Mapped[intpk]
,
or Mapped[user_fk]
draw from both the
registry.type_annotation_map
as well as the
Annotated
construct directly in order to re-use pre-established typing
and column configurations.
Optional step - turn mapped classes into dataclasses¶
We can turn mapped classes into dataclasses, where a key advantage
is that we can build a strictly-typed __init__()
method with explicit
positional, keyword only, and default arguments, not to mention we get methods
such as __str__()
and __repr__()
for free. The next section
Native Support for Dataclasses Mapped as ORM Models illustrates further transformation of the above
model.
Typing is supported from step 3 onwards¶
With the above examples, any example from “step 3” on forward will include
that the attributes
of the model are typed
and will populate through to select()
, Query
,
and Row
objects:
# (variable) stmt: Select[Tuple[int, str]]
stmt = select(User.id, User.name)
with Session(e) as sess:
for row in sess.execute(stmt):
# (variable) row: Row[Tuple[int, str]]
print(row)
# (variable) users: Sequence[User]
users = sess.scalars(select(User)).all()
# (variable) users_legacy: List[User]
users_legacy = sess.query(User).all()
See also
Declarative Table with mapped_column() - Updated Declarative documentation for
Declarative generation and mapping of Table
columns.
Using Legacy Mypy-Typed Models¶
SQLAlchemy applications that use the Mypy plugin with
explicit annotations that don’t use Mapped
in their annotations
are subject to errors under the new system, as such annotations are flagged as
errors when using constructs such as relationship()
.
The section Migration to 2.0 Step Six - Add __allow_unmapped__ to explicitly typed ORM models illustrates how to temporarily disable these errors from being raised for a legacy ORM model that uses explicit annotations.
Native Support for Dataclasses Mapped as ORM Models¶
The new ORM Declarative features introduced above at
ORM Declarative Models introduced the
new mapped_column()
construct and illustrated type-centric
mapping with optional use of PEP 593 Annotated
. We can take
the mapping one step further by integrating this with Python
dataclasses. This new feature is made possible via PEP 681 which
allows for type checkers to recognize classes that are dataclass compatible,
or are fully dataclasses, but were declared through alternate APIs.
Using the dataclasses feature, mapped classes gain an __init__()
method
that supports positional arguments as well as customizable default values
for optional keyword arguments. As mentioned previously, dataclasses also
generate many useful methods such as __str__()
, __eq__()
. Dataclass
serialization methods such as
dataclasses.asdict() and
dataclasses.astuple()
also work, but don’t currently accommodate for self-referential structures, which
makes them less viable for mappings that have bidirectional relationships.
SQLAlchemy’s current integration approach converts the user-defined class into a real dataclass to provide runtime functionality; the feature makes use of the existing dataclass feature introduced in SQLAlchemy 1.4 at Python Dataclasses, attrs Supported w/ Declarative, Imperative Mappings to produce an equivalent runtime mapping with a fully integrated configuration style, which is also more correctly typed than was possible with the previous approach.
To support dataclasses in compliance with PEP 681, ORM constructs like
mapped_column()
and relationship()
accept additional
PEP 681 arguments init
, default
, and default_factory
which
are passed along to the dataclass creation process. These
arguments currently must be present in an explicit directive on the right side,
just as they would be used with dataclasses.field()
; they currently
can’t be local to an Annotated
construct on the left side. To support
the convenient use of Annotated
while still supporting dataclass
configuration, mapped_column()
can merge
a minimal set of right-hand arguments with that of an existing
mapped_column()
construct located on the left side within an Annotated
construct, so that most of the succinctness is maintained, as will be seen
below.
To enable dataclasses using class inheritance we make
use of the MappedAsDataclass
mixin, either directly on each class, or
on the Base
class, as illustrated below where we further modify the
example mapping from “Step 5” of ORM Declarative Models:
from typing_extensions import Annotated
from typing import List
from typing import Optional
from sqlalchemy import ForeignKey
from sqlalchemy import String
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import MappedAsDataclass
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship
class Base(MappedAsDataclass, DeclarativeBase):
"""subclasses will be converted to dataclasses"""
intpk = Annotated[int, mapped_column(primary_key=True)]
str30 = Annotated[str, mapped_column(String(30))]
user_fk = Annotated[int, mapped_column(ForeignKey("user_account.id"))]
class User(Base):
__tablename__ = "user_account"
id: Mapped[intpk] = mapped_column(init=False)
name: Mapped[str30]
fullname: Mapped[Optional[str]] = mapped_column(default=None)
addresses: Mapped[List["Address"]] = relationship(
back_populates="user", default_factory=list
)
class Address(Base):
__tablename__ = "address"
id: Mapped[intpk] = mapped_column(init=False)
email_address: Mapped[str]
user_id: Mapped[user_fk] = mapped_column(init=False)
user: Mapped["User"] = relationship(back_populates="addresses", default=None)
The above mapping has used the @dataclasses.dataclass
decorator directly
on each mapped class at the same time that the declarative mapping was
set up, internally setting up each dataclasses.field()
directive as
indicated. User
/ Address
structures can be created using
positional arguments as configured:
>>> u1 = User("username", fullname="full name", addresses=[Address("email@address")])
>>> u1
User(id=None, name='username', fullname='full name', addresses=[Address(id=None, email_address='email@address', user_id=None, user=...)])
See also
Optimized ORM bulk insert now implemented for all backends other than MySQL¶
The dramatic performance improvement introduced in the 1.4 series and described
at ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases has now been generalized to all included backends that
support RETURNING, which is all backends other than MySQL: SQLite, MariaDB,
PostgreSQL (all drivers), and Oracle; SQL Server has support but is
temporarily disabled in version 2.0.9 [1]. While the original feature
was most critical for the psycopg2 driver which otherwise had major performance
issues when using cursor.executemany()
, the change is also critical for
other PostgreSQL drivers such as asyncpg, as when using RETURNING,
single-statement INSERT statements are still unacceptably slow, as well
as when using SQL Server that also seems to have very slow executemany
speed for INSERT statements regardless of whether or not RETURNING is used.
The performance of the new feature provides an almost across-the-board order of magnitude performance increase for basically every driver when INSERTing ORM objects that don’t have a pre-assigned primary key value, as indicated in the table below, in most cases specific to the use of RETURNING which is not normally supported with executemany().
The psycopg2 “fast execution helper” approach consists of transforming an
INSERT..RETURNING statement with a single parameter set into a single
statement that INSERTs many parameter sets, using multiple “VALUES…”
clauses so that it can accommodate many parameter sets at once.
Parameter sets are then typically batched into groups of 1000
or similar, so that no single INSERT statement is excessively large, and the
INSERT statement is then invoked for each batch of parameters, rather than
for each individual parameter set. Primary key values and server defaults
are returned by RETURNING, which continues to work as each statement execution
is invoked using cursor.execute()
, rather than cursor.executemany()
.
This allows many rows to be inserted in one statement while also being able to
return newly-generated primary key values as well as SQL and server defaults.
SQLAlchemy historically has always needed to invoke one statement per parameter
set, as it relied upon Python DBAPI Features such as cursor.lastrowid
which
do not support multiple rows.
With most databases now offering RETURNING (with the conspicuous exception of
MySQL, given that MariaDB supports it), the new change generalizes the psycopg2
“fast execution helper” approach to all dialects that support RETURNING, which
now includes SQlite and MariaDB, and for which no other approach for
“executemany plus RETURNING” is possible, which includes SQLite, MariaDB, and all
PG drivers. The cx_Oracle and oracledb drivers used for Oracle
support RETURNING with executemany natively, and this has also been implemented
to provide equivalent performance improvements. With SQLite and MariaDB now
offering RETURNING support, ORM use of cursor.lastrowid
is nearly a thing
of the past, with only MySQL still relying upon it.
For INSERT statements that don’t use RETURNING, traditional executemany() behavior is used for most backends, with the current exception of psycopg2, which has very slow executemany() performance overall and are still improved by the “insertmanyvalues” approach.
Benchmarks¶
SQLAlchemy includes a Performance Suite within
the examples/
directory, where we can make use of the bulk_insert
suite to benchmark INSERTs of many rows using both Core and ORM in different
ways.
For the tests below, we are inserting 100,000 objects, and in all cases we actually have 100,000 real Python ORM objects in memory, either created up front or generated on the fly. All databases other than SQLite are run over a local network connection, not localhost; this causes the “slower” results to be extremely slow.
Operations that are improved by this feature include:
unit of work flushes for objects added to the session using
Session.add()
andSession.add_all()
.The new ORM Bulk Insert Statement feature, which improves upon the experimental version of this feature first introduced in SQLAlchemy 1.4.
the
Session
“bulk” operations described at Bulk Operations, which are superseded by the above mentioned ORM Bulk Insert feature.
To get a sense of the scale of the operation, below are performance
measurements using the test_flush_no_pk
performance suite, which
historically represents SQLAlchemy’s worst-case INSERT performance task,
where objects that don’t have primary key values need to be INSERTed, and
then the newly generated primary key values must be fetched so that the
objects can be used for subsequent flush operations, such as establishment
within relationships, flushing joined-inheritance models, etc:
@Profiler.profile
def test_flush_no_pk(n):
"""INSERT statements via the ORM (batched with RETURNING if available),
fetching generated row id"""
session = Session(bind=engine)
for chunk in range(0, n, 1000):
session.add_all(
[
Customer(
name="customer name %d" % i,
description="customer description %d" % i,
)
for i in range(chunk, chunk + 1000)
]
)
session.flush()
session.commit()
This test can be run from any SQLAlchemy source tree as follows:
python -m examples.performance.bulk_inserts --test test_flush_no_pk
The table below summarizes performance measurements with the latest 1.4 series of SQLAlchemy compared to 2.0, both running the same test:
Driver |
SQLA 1.4 Time (secs) |
SQLA 2.0 Time (secs) |
sqlite+pysqlite2 (memory) |
6.204843 |
3.554856 |
postgresql+asyncpg (network) |
88.292285 |
4.561492 |
postgresql+psycopg (network) |
N/A (psycopg3) |
4.861368 |
mssql+pyodbc (network) |
158.396667 |
4.825139 |
oracle+cx_Oracle (network) |
92.603953 |
4.809520 |
mariadb+mysqldb (network) |
71.705197 |
4.075377 |
Note
Two additional drivers have no change in performance; the psycopg2 drivers, for which fast executemany was already implemented in SQLAlchemy 1.4, and MySQL, which continues to not offer RETURNING support:
Driver |
SQLA 1.4 Time (secs) |
SQLA 2.0 Time (secs) |
postgresql+psycopg2 (network) |
4.704876 |
4.699883 |
mysql+mysqldb (network) |
77.281997 |
76.132995 |
Summary of Changes¶
The following bullets list the individual changes made within 2.0 in order to get all drivers to this state:
RETURNING implemented for SQLite - #6195
RETURNING implemented for MariaDB - #7011
Fix multi-row RETURNING for Oracle - #6245
make insert() executemany() support RETURNING for as many dialects as possible, usually with VALUES() - #6047
Emit a warning when RETURNING w/ executemany is used for non-supporting backend (currently no RETURNING backend has this limitation) - #7907
The ORM
Mapper.eager_defaults
parameter now defaults to a a new setting"auto"
, which will enable “eager defaults” automatically for INSERT statements, when the backend in use supports RETURNING with “insertmanyvalues”. See Fetching Server-Generated Defaults for documentation.
See also
“Insert Many Values” Behavior for INSERT statements - Documentation and background on the new feature as well as how to configure it
ORM-enabled Insert, Upsert, Update and Delete Statements, with ORM RETURNING¶
SQLAlchemy 1.4 ported the features of the legacy Query
object to
2.0 style execution, which meant that the Select
construct
could be passed to Session.execute()
to deliver ORM results. Support
was also added for Update
and Delete
to be passed to
Session.execute()
, to the degree that they could provide
implementations of Query.update()
and Query.delete()
.
The major missing element has been support for the Insert
construct.
The 1.4 documentation addressed this with some recipes for “inserts” and “upserts”
with use of Select.from_statement()
to integrate RETURNING
into an ORM context. 2.0 now fully closes the gap by integrating direct support for
Insert
as an enhanced version of the Session.bulk_insert_mappings()
method, along with full ORM RETURNING support for all DML structures.
Bulk Insert with RETURNING¶
Insert
can be passed to Session.execute()
, with
or without Insert.returning()
, which when passed with a
separate parameter list will invoke the same process as was previously
implemented by
Session.bulk_insert_mappings()
, with additional enhancements. This will optimize the
batching of rows making use of the new fast insertmany
feature, while also adding support for
heterogenous parameter sets and multiple-table mappings like joined table
inheritance:
>>> users = session.scalars(
... insert(User).returning(User),
... [
... {"name": "spongebob", "fullname": "Spongebob Squarepants"},
... {"name": "sandy", "fullname": "Sandy Cheeks"},
... {"name": "patrick", "fullname": "Patrick Star"},
... {"name": "squidward", "fullname": "Squidward Tentacles"},
... {"name": "ehkrabs", "fullname": "Eugene H. Krabs"},
... ],
... )
>>> print(users.all())
[User(name='spongebob', fullname='Spongebob Squarepants'),
User(name='sandy', fullname='Sandy Cheeks'),
User(name='patrick', fullname='Patrick Star'),
User(name='squidward', fullname='Squidward Tentacles'),
User(name='ehkrabs', fullname='Eugene H. Krabs')]
RETURNING is supported for all of these use cases, where the ORM will construct a full result set from multiple statement invocations.
See also
Bulk UPDATE¶
In a similar manner as that of Insert
, passing the
Update
construct along with a parameter list that includes
primary key values to Session.execute()
will invoke the same process
as previously supported by the Session.bulk_update_mappings()
method. This feature does not however support RETURNING, as it uses
a SQL UPDATE statement that is invoked using DBAPI executemany:
>>> from sqlalchemy import update
>>> session.execute(
... update(User),
... [
... {"id": 1, "fullname": "Spongebob Squarepants"},
... {"id": 3, "fullname": "Patrick Star"},
... ],
... )
See also
INSERT / upsert … VALUES … RETURNING¶
When using Insert
with Insert.values()
, the set of
parameters may include SQL expressions. Additionally, upsert variants
such as those for SQLite, PostgreSQL and MariaDB are also supported.
These statements may now include Insert.returning()
clauses
with column expressions or full ORM entities:
>>> from sqlalchemy.dialects.sqlite import insert as sqlite_upsert
>>> stmt = sqlite_upsert(User).values(
... [
... {"name": "spongebob", "fullname": "Spongebob Squarepants"},
... {"name": "sandy", "fullname": "Sandy Cheeks"},
... {"name": "patrick", "fullname": "Patrick Star"},
... {"name": "squidward", "fullname": "Squidward Tentacles"},
... {"name": "ehkrabs", "fullname": "Eugene H. Krabs"},
... ]
... )
>>> stmt = stmt.on_conflict_do_update(
... index_elements=[User.name], set_=dict(fullname=stmt.excluded.fullname)
... )
>>> result = session.scalars(stmt.returning(User))
>>> print(result.all())
[User(name='spongebob', fullname='Spongebob Squarepants'),
User(name='sandy', fullname='Sandy Cheeks'),
User(name='patrick', fullname='Patrick Star'),
User(name='squidward', fullname='Squidward Tentacles'),
User(name='ehkrabs', fullname='Eugene H. Krabs')]
ORM UPDATE / DELETE with WHERE … RETURNING¶
SQLAlchemy 1.4 also had some modest support for the RETURNING feature to be
used with the update()
and delete()
constructs, when
used with Session.execute()
. This support has now been upgraded
to be fully native, including that the fetch
synchronization strategy
may also proceed whether or not explicit use of RETURNING is present:
>>> from sqlalchemy import update
>>> stmt = (
... update(User)
... .where(User.name == "squidward")
... .values(name="spongebob")
... .returning(User)
... )
>>> result = session.scalars(stmt, execution_options={"synchronize_session": "fetch"})
>>> print(result.all())
Improved synchronize_session
behavior for ORM UPDATE / DELETE¶
The default strategy for synchronize_session
is now a new value "auto"
. This strategy will attempt to use the
"evaluate"
strategy and then automatically fall back to the "fetch"
strategy. For all backends other than MySQL / MariaDB, "fetch"
uses
RETURNING to fetch UPDATE/DELETEd primary key identifiers within the
same statement, so is generally more efficient than previous versions
(in 1.4, RETURNING was only available for PostgreSQL, SQL Server).
See also
Summary of Changes¶
Listed tickets for new ORM DML with RETURNING features:
New “Write Only” relationship strategy supersedes “dynamic”¶
The lazy="dynamic"
loader strategy becomes legacy, in that it is hardcoded
to make use of legacy Query
. This loader strategy is both not
compatible with asyncio, and additionally has many behaviors that implicitly
iterate its contents, which defeat the original purpose of the “dynamic”
relationship as being for very large collections that should not be implicitly
fully loaded into memory at any time.
The “dynamic” strategy is now superseded by a new strategy
lazy="write_only"
. Configuration of “write only” may be achieved using
the relationship.lazy
parameter of relationship()
,
or when using type annotated mappings,
indicating the WriteOnlyMapped
annotation as the mapping style:
from sqlalchemy.orm import WriteOnlyMapped
class Base(DeclarativeBase):
pass
class Account(Base):
__tablename__ = "account"
id: Mapped[int] = mapped_column(primary_key=True)
identifier: Mapped[str]
account_transactions: WriteOnlyMapped["AccountTransaction"] = relationship(
cascade="all, delete-orphan",
passive_deletes=True,
order_by="AccountTransaction.timestamp",
)
class AccountTransaction(Base):
__tablename__ = "account_transaction"
id: Mapped[int] = mapped_column(primary_key=True)
account_id: Mapped[int] = mapped_column(
ForeignKey("account.id", ondelete="cascade")
)
description: Mapped[str]
amount: Mapped[Decimal]
timestamp: Mapped[datetime] = mapped_column(default=func.now())
The write-only-mapped collection resembles lazy="dynamic"
in that
the collection may be assigned up front, and also has methods such as
WriteOnlyCollection.add()
and WriteOnlyCollection.remove()
to modify the collection on an individual item basis:
new_account = Account(
identifier="account_01",
account_transactions=[
AccountTransaction(description="initial deposit", amount=Decimal("500.00")),
AccountTransaction(description="transfer", amount=Decimal("1000.00")),
AccountTransaction(description="withdrawal", amount=Decimal("-29.50")),
],
)
new_account.account_transactions.add(
AccountTransaction(description="transfer", amount=Decimal("2000.00"))
)
The bigger difference is on the database loading side, where the collection
has no ability to load objects from the database directly; instead,
SQL construction methods such as WriteOnlyCollection.select()
are used to
produce SQL constructs such as Select
which are then executed
using 2.0 style to load the desired objects in an explicit way:
account_transactions = session.scalars(
existing_account.account_transactions.select()
.where(AccountTransaction.amount < 0)
.limit(10)
).all()
The WriteOnlyCollection
also integrates with the new
ORM bulk dml features, including support for bulk INSERT
and UPDATE/DELETE with WHERE criteria, all including RETURNING support as
well. See the complete documentation at Write Only Relationships.
See also
New pep-484 / type annotated mapping support for Dynamic Relationships¶
Even though “dynamic” relationships are legacy in 2.0, as these patterns
are expected to have a long lifespan,
type annotated mapping support
is now added for “dynamic” relationships in the same way that its available
for the new lazy="write_only"
approach, using the DynamicMapped
annotation:
from sqlalchemy.orm import DynamicMapped
class Base(DeclarativeBase):
pass
class Account(Base):
__tablename__ = "account"
id: Mapped[int] = mapped_column(primary_key=True)
identifier: Mapped[str]
account_transactions: DynamicMapped["AccountTransaction"] = relationship(
cascade="all, delete-orphan",
passive_deletes=True,
order_by="AccountTransaction.timestamp",
)
class AccountTransaction(Base):
__tablename__ = "account_transaction"
id: Mapped[int] = mapped_column(primary_key=True)
account_id: Mapped[int] = mapped_column(
ForeignKey("account.id", ondelete="cascade")
)
description: Mapped[str]
amount: Mapped[Decimal]
timestamp: Mapped[datetime] = mapped_column(default=func.now())
The above mapping will provide an Account.account_transactions
collection
that is typed as returning the AppenderQuery
collection type,
including its element type, e.g. AppenderQuery[AccountTransaction]
. This
then allows iteration and queries to yield objects which are typed
as AccountTransaction
.
See also
Installation is now fully pep-517 enabled¶
The source distribution now includes a pyproject.toml
file to allow for
complete PEP 517 support. In particular this allows a local source build
using pip
to automatically install the Cython optional dependency.
C Extensions now ported to Cython¶
The SQLAlchemy C extensions have been replaced with all new extensions written in Cython. While Cython was evaluated back in 2010 when the C extensions were first created, the nature and focus of the C extensions in use today has changed quite a bit from that time. At the same time, Cython has apparently evolved significantly, as has the Python build / distribution toolchain which made it feasible for us to revisit it.
The move to Cython provides dramatic new advantages with no apparent downsides:
The Cython extensions that replace specific C extensions have all benchmarked as faster, often slightly, but sometimes significantly, than virtually all the C code that SQLAlchemy previously included. While this seems amazing, it appears to be a product of non-obvious optimizations within Cython’s implementation that would not be present in a direct Python to C port of a function, as was particularly the case for many of the custom collection types added to the C extensions.
Cython extensions are much easier to write, maintain and debug compared to raw C code, and in most cases are line-per-line equivalent to the Python code. It is expected that many more elements of SQLAlchemy will be ported to Cython in the coming releases which should open many new doors to performance improvements that were previously out of reach.
Cython is very mature and widely used, including being the basis of some of the prominent database drivers supported by SQLAlchemy including
asyncpg
,psycopg3
andasyncmy
.
Like the previous C extensions, the Cython extensions are pre-built within
SQLAlchemy’s wheel distributions which are automatically available to pip
from PyPi. Manual build instructions are also unchanged with the exception
of the Cython requirement.
See also
Major Architectural, Performance and API Enhancements for Database Reflection¶
The internal system by which Table
objects and their components are
reflected has been completely rearchitected to
allow high performance bulk reflection of thousands of tables at once for
participating dialects. Currently, the PostgreSQL and Oracle dialects
participate in the new architecture, where the PostgreSQL dialect can now
reflect a large series of Table
objects nearly three times faster,
and the Oracle dialect can now reflect a large series of Table
objects ten times faster.
The rearchitecture applies most directly to dialects that make use of SELECT queries against system catalog tables to reflect tables, and the remaining included dialect that can benefit from this approach will be the SQL Server dialect. The MySQL/MariaDB and SQLite dialects by contrast make use of non-relational systems to reflect database tables, and were not subject to a pre-existing performance issue.
The new API is backwards compatible with the previous system, and should require no changes to third party dialects to retain compatibility; third party dialects can also opt into the new system by implementing batched queries for schema reflection.
Along with this change, the API and behavior of the Inspector
object has been improved and enhanced with more consistent cross-dialect
behaviors as well as new methods and new performance features.
Performance Overview¶
The source distribution includes a script
test/perf/many_table_reflection.py
which benches both existing reflection
features as well as new ones. A limited set of its tests may be run on older
versions of SQLAlchemy, where here we use it to illustrate differences in
performance to invoke metadata.reflect()
to reflect 250 Table
objects at once over a local network connection:
Dialect |
Operation |
SQLA 1.4 Time (secs) |
SQLA 2.0 Time (secs) |
postgresql+psycopg2 |
|
8.2 |
3.3 |
oracle+cx_oracle |
|
60.4 |
6.8 |
Behavioral Changes for Inspector()
¶
For SQLAlchemy-included dialects for SQLite, PostgreSQL, MySQL/MariaDB,
Oracle, and SQL Server, the Inspector.has_table()
,
Inspector.has_sequence()
, Inspector.has_index()
,
Inspector.get_table_names()
and
Inspector.get_sequence_names()
now all behave consistently in terms
of caching: they all fully cache their result after being called the first
time for a particular Inspector
object. Programs that create or
drop tables/sequences while calling upon the same Inspector
object will not receive updated status after the state of the database has
changed. A call to Inspector.clear_cache()
or a new
Inspector
should be used when DDL changes are to be executed.
Previously, the Inspector.has_table()
,
Inspector.has_sequence()
methods did not implement caching nor did
the Inspector
support caching for these methods, while the
Inspector.get_table_names()
and
Inspector.get_sequence_names()
methods were, leading to inconsistent
results between the two types of method.
Behavior for third party dialects is dependent on whether or not they implement the “reflection cache” decorator for the dialect-level implementation of these methods.
New Methods and Improvements for Inspector()
¶
added a method
Inspector.has_schema()
that returns if a schema is present in the target databaseadded a method
Inspector.has_index()
that returns if a table has a particular index.Inspection methods such as
Inspector.get_columns()
that work on a single table at a time should now all consistently raiseNoSuchTableError
if a table or view is not found; this change is specific to individual dialects, so may not be the case for existing third-party dialects.Separated the handling of “views” and “materialized views”, as in real world use cases, these two constructs make use of different DDL for CREATE and DROP; this includes that there are now separate
Inspector.get_view_names()
andInspector.get_materialized_view_names()
methods.
Dialect support for psycopg 3 (a.k.a. “psycopg”)¶
Added dialect support for the psycopg 3
DBAPI, which despite the number “3” now goes by the package name psycopg
,
superseding the previous psycopg2
package that for the time being remains
SQLAlchemy’s “default” driver for the postgresql
dialects. psycopg
is a
completely reworked and modernized database adapter for PostgreSQL which
supports concepts such as prepared statements as well as Python asyncio.
psycopg
is the first DBAPI supported by SQLAlchemy which provides
both a pep-249 synchronous API as well as an asyncio driver. The same
psycopg
database URL may be used with the create_engine()
and create_async_engine()
engine-creation functions, and the
corresponding sync or asyncio version of the dialect will be selected
automatically.
See also
Dialect support for oracledb¶
Added dialect support for the oracledb DBAPI, which is the renamed, new major release of the popular cx_Oracle driver.
See also
New Conditional DDL for Constraints and Indexes¶
A new method Constraint.ddl_if()
and Index.ddl_if()
allows constructs such as CheckConstraint
, UniqueConstraint
and Index
to be rendered conditionally for a given
Table
, based on the same kinds of criteria that are accepted
by the DDLElement.execute_if()
method. In the example below,
the CHECK constraint and index will only be produced against a PostgreSQL
backend:
meta = MetaData()
my_table = Table(
"my_table",
meta,
Column("id", Integer, primary_key=True),
Column("num", Integer),
Column("data", String),
Index("my_pg_index", "data").ddl_if(dialect="postgresql"),
CheckConstraint("num > 5").ddl_if(dialect="postgresql"),
)
e1 = create_engine("sqlite://", echo=True)
meta.create_all(e1) # will not generate CHECK and INDEX
e2 = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
meta.create_all(e2) # will generate CHECK and INDEX
DATE, TIME, DATETIME datatypes now support literal rendering on all backends¶
Literal rendering is now implemented for date and time types for backend specific compilation, including PostgreSQL and Oracle:
>>> import datetime
>>> from sqlalchemy import DATETIME
>>> from sqlalchemy import literal
>>> from sqlalchemy.dialects import oracle
>>> from sqlalchemy.dialects import postgresql
>>> date_literal = literal(datetime.datetime.now(), DATETIME)
>>> print(
... date_literal.compile(
... dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}
... )
... )
'2022-12-17 11:02:13.575789'
>>> print(
... date_literal.compile(
... dialect=oracle.dialect(), compile_kwargs={"literal_binds": True}
... )
... )
TO_TIMESTAMP('2022-12-17 11:02:13.575789', 'YYYY-MM-DD HH24:MI:SS.FF')
Previously, such literal rendering only worked when stringifying statements
without any dialect given; when attempting to render with a dialect-specific
type, a NotImplementedError
would be raised, up until
SQLAlchemy 1.4.45 where this became a CompileError
(part of
#8800).
The default rendering is modified ISO-8601 rendering (i.e. ISO-8601 with the T
converted to a space) when using literal_binds
with the SQL compilers
provided by the PostgreSQL, MySQL, MariaDB, MSSQL, Oracle dialects. For Oracle,
the ISO format is wrapped inside of an appropriate TO_DATE() function call.
The rendering for SQLite is unchanged as this dialect always included string
rendering for date values.
Context Manager Support for Result
, AsyncResult
¶
The Result
object now supports context manager use, which will
ensure the object and its underlying cursor is closed at the end of the block.
This is useful in particular with server side cursors, where it’s important that
the open cursor object is closed at the end of an operation, even if user-defined
exceptions have occurred:
with engine.connect() as conn:
with conn.execution_options(yield_per=100).execute(
text("select * from table")
) as result:
for row in result:
print(f"{row}")
With asyncio use, the AsyncResult
and AsyncConnection
have
been altered to provide for optional async context manager use, as in:
async with async_engine.connect() as conn:
async with conn.execution_options(yield_per=100).execute(
text("select * from table")
) as result:
for row in result:
print(f"{row}")
Behavioral Changes¶
This section covers behavioral changes made in SQLAlchemy 2.0 which are not otherwise part of the major 1.4->2.0 migration path; changes here are not expected to have significant effects on backwards compatibility.
New transaction join modes for Session
¶
The behavior of “joining an external transaction into a Session” has been
revised and improved, allowing explicit control over how the
Session
will accommodate an incoming Connection
that already has a transaction and possibly a savepoint already established.
The new parameter Session.join_transaction_mode
includes a
series of option values which can accommodate the existing transaction in
several ways, most importantly allowing a Session
to operate in a
fully transactional style using savepoints exclusively, while leaving the
externally initiated transaction non-committed and active under all
circumstances, allowing test suites to rollback all changes that take place
within tests.
The primary improvement this allows is that the recipe documented at
Joining a Session into an External Transaction (such as for test suites), which also changed from SQLAlchemy 1.3
to 1.4, is now simplified to no longer require explicit use of an event
handler or any mention of an explicit savepoint; by using
join_transaction_mode="create_savepoint"
, the Session
will
never affect the state of an incoming transaction, and will instead create a
savepoint (i.e. “nested transaction”) as its root transaction.
The following illustrates part of the example given at Joining a Session into an External Transaction (such as for test suites); see that section for a full example:
class SomeTest(TestCase):
def setUp(self):
# connect to the database
self.connection = engine.connect()
# begin a non-ORM transaction
self.trans = self.connection.begin()
# bind an individual Session to the connection, selecting
# "create_savepoint" join_transaction_mode
self.session = Session(
bind=self.connection, join_transaction_mode="create_savepoint"
)
def tearDown(self):
self.session.close()
# rollback non-ORM transaction
self.trans.rollback()
# return connection to the Engine
self.connection.close()
The default mode selected for Session.join_transaction_mode
is "conditional_savepoint"
, which uses "create_savepoint"
behavior
if the given Connection
is itself already on a savepoint.
If the given Connection
is in a transaction but not a
savepoint, the Session
will propagate “rollback” calls
but not “commit” calls, but will not begin a new savepoint on its own. This
behavior is chosen by default for its maximum compatibility with
older SQLAlchemy versions as well as that it does not start a new SAVEPOINT
unless the given driver is already making use of SAVEPOINT, as support
for SAVEPOINT varies not only with specific backend and driver but also
configurationally.
The following illustrates a case that worked in SQLAlchemy 1.3, stopped working in SQLAlchemy 1.4, and is now restored in SQLAlchemy 2.0:
engine = create_engine("...")
# setup outer connection with a transaction and a SAVEPOINT
conn = engine.connect()
trans = conn.begin()
nested = conn.begin_nested()
# bind a Session to that connection and operate upon it, including
# a commit
session = Session(conn)
session.connection()
session.commit()
session.close()
# assert both SAVEPOINT and transaction remain active
assert nested.is_active
nested.rollback()
trans.rollback()
Where above, a Session
is joined to a Connection
that has a savepoint started on it; the state of these two units remains
unchanged after the Session
has worked with the transaction. In
SQLAlchemy 1.3, the above case worked because the Session
would
begin a “subtransaction” upon the Connection
, which would
allow the outer savepoint / transaction to remain unaffected for simple cases
as above. Since subtransactions were deprecated in 1.4 and are now removed in
2.0, this behavior was no longer available. The new default behavior improves
upon the behavior of “subtransactions” by using a real, second SAVEPOINT
instead, so that even calls to Session.rollback()
prevent the
Session
from “breaking out” into the externally initiated
SAVEPOINT or transaction.
New code that is joining a transaction-started Connection
into
a Session
should however select a
Session.join_transaction_mode
explicitly, so that the desired
behavior is explicitly defined.
str(engine.url)
will obfuscate the password by default¶
To avoid leakage of database passwords, calling str()
on a
URL
will now enable the password obfuscation feature by default.
Previously, this obfuscation would be in place for __repr__()
calls
but not __str__()
. This change will impact applications and test suites
that attempt to invoke create_engine()
given the stringified URL
from another engine, such as:
>>> e1 = create_engine("postgresql+psycopg2://scott:tiger@localhost/test")
>>> e2 = create_engine(str(e1.url))
The above engine e2
will not have the correct password; it will have the
obfuscated string "***"
.
The preferred approach for the above pattern is to pass the
URL
object directly, there’s no need to stringify:
>>> e1 = create_engine("postgresql+psycopg2://scott:tiger@localhost/test")
>>> e2 = create_engine(e1.url)
Otherwise, for a stringified URL with cleartext password, use the
URL.render_as_string()
method, passing the
URL.render_as_string.hide_password
parameter
as False
:
>>> e1 = create_engine("postgresql+psycopg2://scott:tiger@localhost/test")
>>> url_string = e1.url.render_as_string(hide_password=False)
>>> e2 = create_engine(url_string)
Stricter rules for replacement of Columns in Table objects with same-names, keys¶
Stricter rules are in place for appending of Column
objects to
Table
objects, both moving some previous deprecation warnings to
exceptions, and preventing some previous scenarios that would cause
duplicate columns to appear in tables, when
Table.extend_existing
were set to True
, for both
programmatic Table
construction as well as during reflection
operations.
Under no circumstances should a
Table
object ever have two or moreColumn
objects with the same name, regardless of what .key they have. An edge case where this was still possible was identified and fixed.Adding a
Column
to aTable
that has the same name or key as an existingColumn
will always raiseDuplicateColumnError
(a new subclass ofArgumentError
in 2.0.0b4) unless additional parameters are present;Table.append_column.replace_existing
forTable.append_column()
, andTable.extend_existing
for construction of a same-namedTable
as an existing one, with or without reflection being used. Previously, there was a deprecation warning in place for this scenario.A warning is now emitted if a
Table
is created, that does includeTable.extend_existing
, where an incomingColumn
that has no separateColumn.key
would fully replace an existingColumn
that does have a key, which suggests the operation is not what the user intended. This can happen particularly during a secondary reflection step, such asmetadata.reflect(extend_existing=True)
. The warning suggests that theTable.autoload_replace
parameter be set toFalse
to prevent this. Previously, in 1.4 and earlier, the incoming column would be added in addition to the existing column. This was a bug and is a behavioral change in 2.0 (as of 2.0.0b4), as the previous key will no longer be present in the column collection when this occurs.
ORM Declarative Applies Column Orders Differently; Control behavior using sort_order
¶
Declarative has changed the system by which mapped columns that originate from mixin or abstract base classes are sorted along with the columns that are on the declared class itself to place columns from the declared class first, followed by mixin columns. The following mapping:
class Foo:
col1 = mapped_column(Integer)
col3 = mapped_column(Integer)
class Bar:
col2 = mapped_column(Integer)
col4 = mapped_column(Integer)
class Model(Base, Foo, Bar):
id = mapped_column(Integer, primary_key=True)
__tablename__ = "model"
Produces a CREATE TABLE as follows on 1.4:
CREATE TABLE model (
col1 INTEGER,
col3 INTEGER,
col2 INTEGER,
col4 INTEGER,
id INTEGER NOT NULL,
PRIMARY KEY (id)
)
Whereas on 2.0 it produces:
CREATE TABLE model (
id INTEGER NOT NULL,
col1 INTEGER,
col3 INTEGER,
col2 INTEGER,
col4 INTEGER,
PRIMARY KEY (id)
)
For the specific case above, this can be seen as an improvement, as the primary
key columns on the Model
are now where one would typically prefer. However,
this is no comfort for the application that defined models the other way
around, as:
class Foo:
id = mapped_column(Integer, primary_key=True)
col1 = mapped_column(Integer)
col3 = mapped_column(Integer)
class Model(Foo, Base):
col2 = mapped_column(Integer)
col4 = mapped_column(Integer)
__tablename__ = "model"
This now produces CREATE TABLE output as:
CREATE TABLE model (
col2 INTEGER,
col4 INTEGER,
id INTEGER NOT NULL,
col1 INTEGER,
col3 INTEGER,
PRIMARY KEY (id)
)
To solve this issue, SQLAlchemy 2.0.4 introduces a new parameter on
mapped_column()
called mapped_column.sort_order
,
which is an integer value, defaulting to 0
,
that can be set to a positive or negative value so that columns are placed
before or after other columns, as in the example below:
class Foo:
id = mapped_column(Integer, primary_key=True, sort_order=-10)
col1 = mapped_column(Integer, sort_order=-1)
col3 = mapped_column(Integer)
class Model(Foo, Base):
col2 = mapped_column(Integer)
col4 = mapped_column(Integer)
__tablename__ = "model"
The above model places “id” before all others and “col1” after “id”:
CREATE TABLE model (
id INTEGER NOT NULL,
col1 INTEGER,
col2 INTEGER,
col4 INTEGER,
col3 INTEGER,
PRIMARY KEY (id)
)
Future SQLAlchemy releases may opt to provide an explicit ordering hint for the
mapped_column
construct, as this ordering is ORM specific.
The Sequence
construct reverts to not having any explicit default “start” value; impacts MS SQL Server¶
Prior to SQLAlchemy 1.4, the Sequence
construct would emit only
simple CREATE SEQUENCE
DDL, if no additional arguments were specified:
>>> # SQLAlchemy 1.3 (and 2.0)
>>> from sqlalchemy import Sequence
>>> from sqlalchemy.schema import CreateSequence
>>> print(CreateSequence(Sequence("my_seq")))
CREATE SEQUENCE my_seq
However, as Sequence
support was added for MS SQL Server, where the
default start value is inconveniently set to -2**63
,
version 1.4 decided to default the DDL to emit a start value of 1, if
Sequence.start
were not otherwise provided:
>>> # SQLAlchemy 1.4 (only)
>>> from sqlalchemy import Sequence
>>> from sqlalchemy.schema import CreateSequence
>>> print(CreateSequence(Sequence("my_seq")))
CREATE SEQUENCE my_seq START WITH 1
This change has introduced other complexities, including that when
the Sequence.min_value
parameter is included, this default of
1
should in fact default to what Sequence.min_value
states, else a min_value that’s below the start_value may be seen as
contradictory. As looking at this issue started to become a bit of a
rabbit hole of other various edge cases, we decided to instead revert this
change and restore the original behavior of Sequence
which is
to have no opinion, and just emit CREATE SEQUENCE, allowing the database
itself to make its decisions on how the various parameters of SEQUENCE
should interact with each other.
Therefore, to ensure that the start value is 1 on all backends, the start value of 1 may be indicated explicitly, as below:
>>> # All SQLAlchemy versions
>>> from sqlalchemy import Sequence
>>> from sqlalchemy.schema import CreateSequence
>>> print(CreateSequence(Sequence("my_seq", start=1)))
CREATE SEQUENCE my_seq START WITH 1
Beyond all of that, for autogeneration of integer primary keys on modern
backends including PostgreSQL, Oracle, SQL Server, the Identity
construct should be preferred, which also works the same way in 1.4 and 2.0
with no changes in behavior.
“with_variant()” clones the original TypeEngine rather than changing the type¶
The TypeEngine.with_variant()
method, which is used to apply
alternate per-database behaviors to a particular type, now returns a copy of
the original TypeEngine
object with the variant information
stored internally, rather than wrapping it inside the Variant
class.
While the previous Variant
approach was able to maintain all the in-Python
behaviors of the original type using dynamic attribute getters, the improvement
here is that when calling upon a variant, the returned type remains an instance
of the original type, which works more smoothly with type checkers such as mypy
and pylance. Given a program as below:
import typing
from sqlalchemy import String
from sqlalchemy.dialects.mysql import VARCHAR
type_ = String(255).with_variant(VARCHAR(255, charset="utf8mb4"), "mysql", "mariadb")
if typing.TYPE_CHECKING:
reveal_type(type_)
A type checker like pyright will now report the type as:
info: Type of "type_" is "String"
In addition, as illustrated above, multiple dialect names may be passed for
single type, in particular this is helpful for the pair of "mysql"
and
"mariadb"
dialects which are considered separately as of SQLAlchemy 1.4.
Python division operator performs true division for all backends; added floor division¶
The Core expression language now supports both “true division” (i.e. the /
Python operator) and “floor division” (i.e. the //
Python operator)
including backend-specific behaviors to normalize different databases in this
regard.
Given a “true division” operation against two integer values:
expr = literal(5, Integer) / literal(10, Integer)
The SQL division operator on PostgreSQL for example normally acts as “floor division” when used against integers, meaning the above result would return the integer “0”. For this and similar backends, SQLAlchemy now renders the SQL using a form which is equivalent towards:
%(param_1)s / CAST(%(param_2)s AS NUMERIC)
With param_1=5
, param_2=10
, so that the return expression will be of type
NUMERIC, typically as the Python value decimal.Decimal("0.5")
.
Given a “floor division” operation against two integer values:
expr = literal(5, Integer) // literal(10, Integer)
The SQL division operator on MySQL and Oracle for example normally acts as “true division” when used against integers, meaning the above result would return the floating point value “0.5”. For these and similar backends, SQLAlchemy now renders the SQL using a form which is equivalent towards:
FLOOR(%(param_1)s / %(param_2)s)
With param_1=5, param_2=10, so that the return expression will be of type
INTEGER, as the Python value 0
.
The backwards-incompatible change here would be if an application using
PostgreSQL, SQL Server, or SQLite which relied on the Python “truediv” operator
to return an integer value in all cases. Applications which rely upon this
behavior should instead use the Python “floor division” operator //
for these operations, or for forwards compatibility when using a previous
SQLAlchemy version, the floor function:
expr = func.floor(literal(5, Integer) / literal(10, Integer))
The above form would be needed on any SQLAlchemy version prior to 2.0 in order to provide backend-agnostic floor division.
Session raises proactively when illegal concurrent or reentrant access is detected¶
The Session
can now trap more errors related to illegal concurrent
state changes within multithreaded or other concurrent scenarios as well as for
event hooks which perform unexpected state changes.
One error that’s been known to occur when a Session
is used in
multiple threads simultaneously is
AttributeError: 'NoneType' object has no attribute 'twophase'
, which is
completely cryptic. This error occurs when a thread calls
Session.commit()
which internally invokes the
SessionTransaction.close()
method to end the transactional context,
at the same time that another thread is in progress running a query
as from Session.execute()
. Within Session.execute()
,
the internal method that acquires a database connection for the current
transaction first begins by asserting that the session is “active”, but
after this assertion passes, the concurrent call to Session.close()
interferes with this state which leads to the undefined condition above.
The change applies guards to all state-changing methods surrounding the
SessionTransaction
object so that in the above case, the
Session.commit()
method will instead fail as it will seek to change
the state to one that is disallowed for the duration of the already-in-progress
method that wants to get the current connection to run a database query.
Using the test script illustrated at #7433, the previous error case looks like:
Traceback (most recent call last):
File "/home/classic/dev/sqlalchemy/test3.py", line 30, in worker
sess.execute(select(A)).all()
File "/home/classic/tmp/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1691, in execute
conn = self._connection_for_bind(bind)
File "/home/classic/tmp/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1532, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/home/classic/tmp/sqlalchemy/lib/sqlalchemy/orm/session.py", line 754, in _connection_for_bind
if self.session.twophase and self._parent is None:
AttributeError: 'NoneType' object has no attribute 'twophase'
Where the _connection_for_bind()
method isn’t able to continue since
concurrent access placed it into an invalid state. Using the new approach, the
originator of the state change throws the error instead:
File "/home/classic/dev/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1785, in close
self._close_impl(invalidate=False)
File "/home/classic/dev/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1827, in _close_impl
transaction.close(invalidate)
File "<string>", line 2, in close
File "/home/classic/dev/sqlalchemy/lib/sqlalchemy/orm/session.py", line 506, in _go
raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Method 'close()' can't be called here;
method '_connection_for_bind()' is already in progress and this would cause
an unexpected state change to symbol('CLOSED')
The state transition checks intentionally don’t use explicit locks to detect
concurrent thread activity, instead relying upon simple attribute set / value
test operations that inherently fail when unexpected concurrent changes occur.
The rationale is that the approach can detect illegal state changes that occur
entirely within a single thread, such as an event handler that runs on session
transaction events calls a state-changing method that’s not expected, or under
asyncio if a particular Session
were shared among multiple
asyncio tasks, as well as when using patching-style concurrency approaches
such as gevent.
The SQLite dialect uses QueuePool for file-based databases¶
The SQLite dialect now defaults to QueuePool
when a file
based database is used. This is set along with setting the
check_same_thread
parameter to False
. It has been observed that the
previous approach of defaulting to NullPool
, which does not
hold onto database connections after they are released, did in fact have a
measurable negative performance impact. As always, the pool class is
customizable via the create_engine.poolclass
parameter.
See also
New Oracle FLOAT type with binary precision; decimal precision not accepted directly¶
A new datatype FLOAT
has been added to the Oracle dialect, to
accompany the addition of Double
and database-specific
DOUBLE
, DOUBLE_PRECISION
and
REAL
datatypes. Oracle’s FLOAT
accepts a so-called
“binary precision” parameter that per Oracle documentation is roughly a
standard “precision” value divided by 0.3103:
from sqlalchemy.dialects import oracle
Table("some_table", metadata, Column("value", oracle.FLOAT(126)))
A binary precision value of 126 is synonymous with using the
DOUBLE_PRECISION
datatype, and a value of 63 is equivalent
to using the REAL
datatype. Other precision values are
specific to the FLOAT
type itself.
The SQLAlchemy Float
datatype also accepts a “precision”
parameter, but this is decimal precision which is not accepted by
Oracle. Rather than attempting to guess the conversion, the Oracle dialect
will now raise an informative error if Float
is used with
a precision value against the Oracle backend. To specify a
Float
datatype with an explicit precision value for
supporting backends, while also supporting other backends, use
the TypeEngine.with_variant()
method as follows:
from sqlalchemy.types import Float
from sqlalchemy.dialects import oracle
Table(
"some_table",
metadata,
Column("value", Float(5).with_variant(oracle.FLOAT(16), "oracle")),
)
New RANGE / MULTIRANGE support and changes for PostgreSQL backends¶
RANGE / MULTIRANGE support has been fully implemented for psycopg2, psycopg3,
and asyncpg dialects. The new support uses a new SQLAlchemy-specific
Range
object that is agnostic of the different backends
and does not require the use of backend-specific imports or extension
steps. For multirange support, lists of Range
objects are used.
Code that used the previous psycopg2-specific types should be modified
to use Range
, which presents a compatible interface.
The Range
object also features comparison support which
mirrors that of PostgreSQL. Implemented so far are Range.contains()
and Range.contained_by()
methods which work in the same way as
the PostgreSQL @>
and <@
. Additional operator support may be added
in future releases.
See the documentation at Range and Multirange Types for background on using the new feature.
See also
match()
operator on PostgreSQL uses plainto_tsquery()
rather than to_tsquery()
¶
The Operators.match()
function now renders
col @@ plainto_tsquery(expr)
on the PostgreSQL backend, rather than
col @@ to_tsquery()
. plainto_tsquery()
accepts plain text whereas
to_tsquery()
accepts specialized query symbols, and is therefore less
cross-compatible with other backends.
All PostgreSQL search functions and operators are available through use of
func
to generate PostgreSQL-specific functions and
Operators.bool_op()
(a boolean-typed version of Operators.op()
)
to generate arbitrary operators, in the same manner as they are available
in previous versions. See the examples at Full Text Search.
Existing SQLAlchemy projects that make use of PG-specific directives within
Operators.match()
should make use of func.to_tsquery()
directly.
To render SQL in exactly the same form as would be present
in 1.4, see the version note at Simple plain text matching with match().