Transactions and concurrency

Transactions are a core feature of ZODB. Much has been written about transactions, and we won’t go into much detail here. Transactions provide two core benefits:

Atomicity

When a transaction executes, it succeeds or fails completely. If some data are updated and then an error occurs, causing the transaction to fail, the updates are rolled back automatically. The application using the transactional system doesn’t have to undo partial changes. This takes a significant burden from developers and increases the reliability of applications.

Concurrency

Transactions provide a way of managing concurrent updates to data. Different programs operate on the data independently, without having to use low-level techniques to moderate their access. Coordination and synchronization happen via transactions.

Using transactions

All activity in ZODB happens in the context of database connections and transactions. Here’s a simple example:

import ZODB, transaction
db = ZODB.DB(None) # Use a mapping storage
conn = db.open()

conn.root.x = 1
transaction.commit()

In the example above, we used transaction.commit() to commit a transaction, making the change to conn.root permanent. This is the most common way to use ZODB, at least historically.

If we decide we don’t want to commit a transaction, we can use abort:

conn.root.x = 2
transaction.abort() # conn.root.x goes back to 1

In this example, because we aborted the transaction, the value of conn.root.x was rolled back to 1.

There are a number of things going on here that deserve some explanation. When using transactions, there are three kinds of objects involved:

Transaction

Transactions represent units of work. Each transaction has a beginning and an end. Transactions provide the ITransaction interface.

Transaction manager

Transaction managers create transactions and provide APIs to start and end transactions. The transactions managed are always sequential. There is always exactly one active transaction associated with a transaction manager at any point in time. Transaction managers provide the ITransactionManager interface.

Data manager

Data managers manage data associated with transactions. ZODB connections are data managers. The details of how they interact with transactions aren’t important here.

Explicit transaction managers

ZODB connections have transaction managers associated with them when they’re opened. When we call the database open() method without an argument, a thread-local transaction manager is used. Each thread has its own transaction manager. When we called transaction.commit() above we were calling commit on the thread-local transaction manager.

Because we used a thread-local transaction manager, all of the work in the transaction needs to happen in the same thread. Similarly, only one transaction can be active in a thread.

If we want to run multiple simultaneous transactions in a single thread, or if we want to spread the work of a transaction over multiple threads [5], then we can create transaction managers ourselves and pass them to open():

my_transaction_manager = transaction.TransactionManager()
conn = db.open(my_transaction_manager)
conn.root.x = 2
my_transaction_manager.commit()

In this example, to commit our work, we called commit() on the transaction manager we created and passed to open().

Context managers

In the examples above, the transaction beginnings were implicit. Transactions were effectively [6] created when the transaction managers were created and when previous transactions were committed. We can create transactions explicitly using begin():

my_transaction_manager.begin()

A more modern [7] way to manage transaction boundaries is to use context managers and the Python with statement. Transaction managers are context managers, so we can use them with the with statement directly:

with my_transaction_manager as trans:
   trans.note(u"incrementing x")
   conn.root.x += 1

When used as a context manager, a transaction manager explicitly begins a new transaction, executes the code block and commits the transaction if there isn’t an error and aborts it if there is an error.

We used as trans above to get the transaction.

Databases provide the transaction() method to execute a code block as a transaction:

with db.transaction() as conn2:
   conn2.root.x += 1

This opens a connection, assignes it its own context manager, and executes the nested code in a transaction. We used as conn2 to get the connection. The transaction boundaries are defined by the with statement.

Getting a connection’s transaction manager

In the previous example, you may have wondered how one might get the current transaction. Every connection has an associated transaction manager, which is available as the transaction_manager attribute. So, for example, if we wanted to set a transaction note:

with db.transaction() as conn2:
   conn2.transaction_manager.get().note(u"incrementing x again")
   conn2.root.x += 1

Here, we used the get() method to get the current transaction.

Connection isolation

In the last few examples, we used a connection opened using transaction(). This was distinct from and used a different transaction manager than the original connection. If we looked at the original connection, conn, we’d see that it has the same value for x that we set earlier:

>>> conn.root.x
3

This is because it’s still in the same transaction that was begun when a change was last committed against it. If we want to see changes, we have to begin a new transaction:

>>> trans = my_transaction_manager.begin()
>>> conn.root.x
5

ZODB uses a timestamp-based commit protocol that provides snapshot isolation. Whenever we look at ZODB data, we see its state as of the time the transaction began.

Conflict errors

As mentioned in the previous section, each connection sees and operates on a view of the database as of the transaction start time. If two connections modify the same object at the same time, one of the connections will get a conflict error when it tries to commit:

with db.transaction() as conn2:
   conn2.root.x += 1

conn.root.x = 9
my_transaction_manager.commit() # will raise a conflict error

If we executed this code, we’d get a ConflictError exception on the last line. After a conflict error is raised, we’d need to abort the transaction, or begin a new one, at which point we’d see the data as written by the other connection:

>>> my_transaction_manager.abort()
>>> conn.root.x
6

The timestamp-based approach used by ZODB is referred to as an optimistic approach, because it works best if there are no conflicts.

The best way to avoid conflicts is to design your application so that multiple connections don’t update the same object at the same time. This isn’t always easy.

Sometimes you may need to queue some operations that update shared data structures, like indexes, so the updates can be made by a dedicated thread or process, without making simultaneous updates.

Retrying transactions

The most common way to deal with conflict errors is to catch them and retry transactions. To do this manually involves code that looks something like this:

max_attempts = 3
attempts = 0
while True:
    try:
        with transaction.manager:
            ... code that updates a database
    except transaction.interfaces.TransientError:
        attempts += 1
        if attempts == max_attempts:
            raise
    else:
        break

In the example above, we used transaction.manager to refer to the thread-local transaction manager, which we then used used with the with statement. When a conflict error occurs, the transaction must be aborted before retrying the update. Using the transaction manager as a context manager in the with statement takes care of this for us.

The example above is rather tedious. There are a number of tools to automate transaction retry. The transaction package provides a context-manager-based mechanism for retrying transactions:

for attempt in transaction.manager.attempts():
    with attempt:
        ... code that updates a database

Which is shorter and simpler [1].

For Python web frameworks, there are WSGI [2] middle-ware components, such as repoze.tm2 that align transaction boundaries with HTTP requests and retry transactions when there are transient errors.

For applications like queue workers or cron jobs, conflicts can sometimes be allowed to fail, letting other queue workers or subsequent cron-job runs retry the work.

Conflict resolution

ZODB provides a conflict-resolution framework for merging conflicting changes. When conflicts occur, conflict resolution is used, when possible, to resolve the conflicts without raising a ConflictError to the application.

Commonly used objects that implement conflict resolution are buckets and Length objects provided by the BTree package.

The main data structures provided by BTrees, BTrees and TreeSets, spread their data over multiple objects. The leaf-level objects, called buckets, allow distinct keys to be updated without causing conflicts [3].

Length objects are conflict-free counters that merge changes by simply accumulating changes.

Caution

Conflict resolution weakens consistency. Resist the temptation to try to implement conflict resolution yourself. In the future, ZODB will provide greater control over conflict resolution, including the option of disabling it.

It’s generally best to avoid conflicts in the first place, if possible.

ZODB and atomicity

ZODB provides atomic transactions. When using ZODB, it’s important to align work with transactions. Once a transaction is committed, it can’t be rolled back [4] automatically. For applications, this implies that work that should be atomic shouldn’t be split over multiple transactions. This may seem somewhat obvious, but the rule can be broken in non-obvious ways. For example a Web API that splits logical operations over multiple web requests, as is often done in REST APIs, violates this rule.

Partial transaction error recovery using savepoints

A transaction can be split into multiple steps that can be rolled back individually. This is done by creating savepoints. Changes in a savepoint can be rolled back without rolling back an entire transaction:

import ZODB
db = ZODB.DB(None) # using a mapping storage
with db.transaction() as conn:
    conn.root.x = 1
    conn.root.y = 0
    savepoint = conn.transaction_manager.savepoint()
    conn.root.y = 2
    savepoint.rollback()

with db.transaction() as conn:
    print([conn.root.x, conn.root.y]) # prints 1 0

If we executed this code, it would print 1 and 0, because while the initial changes were committed, the changes in the savepoint were rolled back.

A secondary benefit of savepoints is that they save any changes made before the savepoint to a file, so that memory of changed objects can be freed if they aren’t used later in the transaction.

Concurrency, threads and processes

ZODB supports concurrency through transactions. Multiple programs [8] can operate independently in separate transactions. They synchronize at transaction boundaries.

The most common way to run ZODB is with each program running in its own thread. Usually the thread-local transaction manager is used.

You can use multiple threads per transaction and you can run multiple transactions in a single thread. To do this, you need to instantiate and use your own transaction manager, as described in Explicit transaction managers. To run multiple transaction managers simultaneously in a thread, you need to use a separate transaction manager for each transaction.

To spread a transaction over multiple threads, you need to keep in mind that database connections, transaction managers and transactions are not thread-safe. You have to prevent simultaneous access from multiple threads. For this reason, using multiple threads with a single transaction is not recommended, but it is possible with care.

Using multiple processes

Using multiple Python processes is a good way to scale an application horizontally, especially given Python’s global interpreter lock.

Some things to keep in mind when utilizing multiple processes:

  • If using the multiprocessing module, you can’t [9] share databases or connections between processes. When you launch a subprocess, you’ll need to re-instantiate your storage and database.

  • You’ll need to use a storage such as ZEO, RelStorage, or NEO, that supports multiple processes. None of the included storages do.