PEEWEE(1) | peewee | PEEWEE(1) |
peewee - peewee Documentation [image]
Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use.
Peewee's source code hosted on GitHub.
New to peewee? These may help:
Most users will want to simply install the latest version, hosted on PyPI:
pip install peewee
Peewee comes with a couple C extensions that will be built if Cython is available.
The project is hosted at https://github.com/coleifer/peewee and can be installed using git:
git clone https://github.com/coleifer/peewee.git cd peewee python setup.py install
NOTE:
If you would like to build the SQLite extension in a git checkout, you can run:
# Build the C extension and place shared libraries alongside other modules. python setup.py build_ext -i
You can test your installation by running the test suite.
python runtests.py
You can test specific features or specific database drivers using the runtests.py script. To view the available test runner options, use:
python runtests.py --help
NOTE:
-- install the hstore extension on the peewee_test postgres db. CREATE EXTENSION hstore;
NOTE:
Peewee includes two SQLite-specific C extensions which provide additional functionality and improved performance for SQLite database users. Peewee will attempt to determine ahead-of-time if SQLite3 is installed, and only build the SQLite extensions if the SQLite shared-library is available on your system.
If, however, you receive errors like the following when attempting to install Peewee, you can explicitly disable the compilation of the SQLite C extensions by settings the NO_SQLITE environment variable.
fatal error: sqlite3.h: No such file or directory
Here is how to install Peewee with the SQLite extensions explicitly disabled:
$ NO_SQLITE=1 python setup.py install
This document presents a brief, high-level overview of Peewee's primary features. This guide will cover:
NOTE:
I strongly recommend opening an interactive shell session and running the code. That way you can get a feel for typing in queries.
Model classes, fields and model instances all map to database concepts:
Object | Corresponds to... |
Model class | Database table |
Field instance | Column on a table |
Model instance | Row in a database table |
When starting a project with peewee, it's typically best to begin with your data model, by defining one or more Model classes:
from peewee import * db = SqliteDatabase('people.db') class Person(Model):
name = CharField()
birthday = DateField()
class Meta:
database = db # This model uses the "people.db" database.
NOTE:
Also note that we named our model Person instead of People. This is a convention you should follow -- even though the table will contain multiple people, we always name the class using the singular form.
There are lots of field types suitable for storing various types of data. Peewee handles converting between pythonic values those used by the database, so you can use Python types in your code without having to worry.
Things get interesting when we set up relationships between models using foreign key relationships. This is simple with peewee:
class Pet(Model):
owner = ForeignKeyField(Person, backref='pets')
name = CharField()
animal_type = CharField()
class Meta:
database = db # this model uses the "people.db" database
Now that we have our models, let's connect to the database. Although it's not necessary to open the connection explicitly, it is good practice since it will reveal any errors with your database connection immediately, as opposed to some arbitrary time later when the first query is executed. It is also good to close the connection when you are done -- for instance, a web app might open a connection when it receives a request, and close the connection when it sends the response.
db.connect()
We'll begin by creating the tables in the database that will store our data. This will create the tables with the appropriate columns, indexes, sequences, and foreign key constraints:
db.create_tables([Person, Pet])
Let's begin by populating the database with some people. We will use the save() and create() methods to add and update people's records.
from datetime import date uncle_bob = Person(name='Bob', birthday=date(1960, 1, 15)) uncle_bob.save() # bob is now stored in the database # Returns: 1
NOTE:
You can also add a person by calling the create() method, which returns a model instance:
grandma = Person.create(name='Grandma', birthday=date(1935, 3, 1)) herb = Person.create(name='Herb', birthday=date(1950, 5, 5))
To update a row, modify the model instance and call save() to persist the changes. Here we will change Grandma's name and then save the changes in the database:
grandma.name = 'Grandma L.' grandma.save() # Update grandma's name in the database. # Returns: 1
Now we have stored 3 people in the database. Let's give them some pets. Grandma doesn't like animals in the house, so she won't have any, but Herb is an animal lover:
bob_kitty = Pet.create(owner=uncle_bob, name='Kitty', animal_type='cat') herb_fido = Pet.create(owner=herb, name='Fido', animal_type='dog') herb_mittens = Pet.create(owner=herb, name='Mittens', animal_type='cat') herb_mittens_jr = Pet.create(owner=herb, name='Mittens Jr', animal_type='cat')
After a long full life, Mittens sickens and dies. We need to remove him from the database:
herb_mittens.delete_instance() # he had a great life # Returns: 1
NOTE:
Uncle Bob decides that too many animals have been dying at Herb's house, so he adopts Fido:
herb_fido.owner = uncle_bob herb_fido.save()
The real strength of our database is in how it allows us to retrieve data through queries. Relational databases are excellent for making ad-hoc queries.
Let's retrieve Grandma's record from the database. To get a single record from the database, use Select.get():
grandma = Person.select().where(Person.name == 'Grandma L.').get()
We can also use the equivalent shorthand Model.get():
grandma = Person.get(Person.name == 'Grandma L.')
Let's list all the people in the database:
for person in Person.select():
print(person.name) # prints: # Bob # Grandma L. # Herb
Let's list all the cats and their owner's name:
query = Pet.select().where(Pet.animal_type == 'cat') for pet in query:
print(pet.name, pet.owner.name) # prints: # Kitty Bob # Mittens Jr Herb
ATTENTION:
For an in-depth guide to working with relationships and joins, refer to the relationships documentation.
We can avoid the extra queries by selecting both Pet and Person, and adding a join.
query = (Pet
.select(Pet, Person)
.join(Person)
.where(Pet.animal_type == 'cat')) for pet in query:
print(pet.name, pet.owner.name) # prints: # Kitty Bob # Mittens Jr Herb
Let's get all the pets owned by Bob:
for pet in Pet.select().join(Person).where(Person.name == 'Bob'):
print(pet.name) # prints: # Kitty # Fido
We can do another cool thing here to get bob's pets. Since we already have an object to represent Bob, we can do this instead:
for pet in Pet.select().where(Pet.owner == uncle_bob):
print(pet.name)
Let's make sure these are sorted alphabetically by adding an order_by() clause:
for pet in Pet.select().where(Pet.owner == uncle_bob).order_by(Pet.name):
print(pet.name) # prints: # Fido # Kitty
Let's list all the people now, youngest to oldest:
for person in Person.select().order_by(Person.birthday.desc()):
print(person.name, person.birthday) # prints: # Bob 1960-01-15 # Herb 1950-05-05 # Grandma L. 1935-03-01
Peewee supports arbitrarily-nested expressions. Let's get all the people whose birthday was either:
d1940 = date(1940, 1, 1) d1960 = date(1960, 1, 1) query = (Person
.select()
.where((Person.birthday < d1940) | (Person.birthday > d1960))) for person in query:
print(person.name, person.birthday) # prints: # Bob 1960-01-15 # Grandma L. 1935-03-01
Now let's do the opposite. People whose birthday is between 1940 and 1960:
query = (Person
.select()
.where(Person.birthday.between(d1940, d1960))) for person in query:
print(person.name, person.birthday) # prints: # Herb 1950-05-05
Now let's list all the people and how many pets they have:
for person in Person.select():
print(person.name, person.pets.count(), 'pets') # prints: # Bob 2 pets # Grandma L. 0 pets # Herb 1 pets
Once again we've run into a classic example of N+1 query behavior. In this case, we're executing an additional query for every Person returned by the original SELECT! We can avoid this by performing a JOIN and using a SQL function to aggregate the results.
query = (Person
.select(Person, fn.COUNT(Pet.id).alias('pet_count'))
.join(Pet, JOIN.LEFT_OUTER) # include people without pets.
.group_by(Person)
.order_by(Person.name)) for person in query:
# "pet_count" becomes an attribute on the returned model instances.
print(person.name, person.pet_count, 'pets') # prints: # Bob 2 pets # Grandma L. 0 pets # Herb 1 pets
NOTE:
Now let's list all the people and the names of all their pets. As you may have guessed, this could easily turn into another N+1 situation if we're not careful.
Before diving into the code, consider how this example is different from the earlier example where we listed all the pets and their owner's name. A pet can only have one owner, so when we performed the join from Pet to Person, there was always going to be a single match. The situation is different when we are joining from Person to Pet because a person may have zero pets or they may have several pets. Because we're using a relational databases, if we were to do a join from Person to Pet then every person with multiple pets would be repeated, once for each pet.
It would look like this:
query = (Person
.select(Person, Pet)
.join(Pet, JOIN.LEFT_OUTER)
.order_by(Person.name, Pet.name)) for person in query:
# We need to check if they have a pet instance attached, since not all
# people have pets.
if hasattr(person, 'pet'):
print(person.name, person.pet.name)
else:
print(person.name, 'no pets') # prints: # Bob Fido # Bob Kitty # Grandma L. no pets # Herb Mittens Jr
Usually this type of duplication is undesirable. To accommodate the more common (and intuitive) workflow of listing a person and attaching a list of that person's pets, we can use a special method called prefetch():
query = Person.select().order_by(Person.name).prefetch(Pet) for person in query:
print(person.name)
for pet in person.pets:
print(' *', pet.name) # prints: # Bob # * Kitty # * Fido # Grandma L. # Herb # * Mittens Jr
One last query. This will use a SQL function to find all people whose names start with either an upper or lower-case G:
expression = fn.Lower(fn.Substr(Person.name, 1, 1)) == 'g' for person in Person.select().where(expression):
print(person.name) # prints: # Grandma L.
This is just the basics! You can make your queries as complex as you like. Check the documentation on querying for more info.
We're done with our database, let's close the connection:
db.close()
In an actual application, there are some established patterns for how you would manage your database connection lifetime. For example, a web application will typically open a connection at start of request, and close the connection after generating the response. A connection pool can help eliminate latency associated with startup costs.
To learn about setting up your database, see the database documentation, which provides many examples. Peewee also supports configuring the database at run-time as well as setting or changing the database at any time.
If you already have a database, you can autogenerate peewee models using pwiz. For instance, if I have a postgresql database named charles_blog, I might run:
python -m pwiz -e postgresql charles_blog > blog_models.py
That's it for the quickstart. If you want to look at a full web-app, check out the example-app.
We'll be building a simple twitter-like site. The source code for the example can be found in the examples/twitter directory. You can also browse the source-code on github. There is also an example blog app if that's more to your liking, however it is not covered in this guide.
The example app uses the flask web framework which is very easy to get started with. If you don't have flask already, you will need to install it to run the example:
pip install flask
[image]
After ensuring that flask is installed, cd into the twitter example directory and execute the run_example.py script:
python run_example.py
The example app will be accessible at http://localhost:5000/
For simplicity all example code is contained within a single module, examples/twitter/app.py. For a guide on structuring larger Flask apps with peewee, check out Structuring Flask Apps.
In the spirit of the popular web framework Django, peewee uses declarative model definitions. If you're not familiar with Django, the idea is that you declare a model class for each table. The model class then defines one or more field attributes which correspond to the table's columns. For the twitter clone, there are just three models:
If you like UML, these are the tables and relationships: [image]
In order to create these models we need to instantiate a SqliteDatabase object. Then we define our model classes, specifying the columns as Field instances on the class.
# create a peewee database instance -- our models will use this database to # persist information database = SqliteDatabase(DATABASE) # model definitions -- the standard "pattern" is to define a base model class # that specifies which database to use. then, any subclasses will automatically # use the correct storage. class BaseModel(Model):
class Meta:
database = database # the user model specifies its fields (or columns) declaratively, like django class User(BaseModel):
username = CharField(unique=True)
password = CharField()
email = CharField()
join_date = DateTimeField() # this model contains two foreign keys to user -- it essentially allows us to # model a "many-to-many" relationship between users. by querying and joining # on different columns we can expose who a user is "related to" and who is # "related to" a given user class Relationship(BaseModel):
from_user = ForeignKeyField(User, backref='relationships')
to_user = ForeignKeyField(User, backref='related_to')
class Meta:
# `indexes` is a tuple of 2-tuples, where the 2-tuples are
# a tuple of column names to index and a boolean indicating
# whether the index is unique or not.
indexes = (
# Specify a unique multi-column index on from/to-user.
(('from_user', 'to_user'), True),
) # a dead simple one-to-many relationship: one user has 0..n messages, exposed by # the foreign key. because we didn't specify, a users messages will be accessible # as a special attribute, User.messages class Message(BaseModel):
user = ForeignKeyField(User, backref='messages')
content = TextField()
pub_date = DateTimeField()
NOTE:
Peewee supports many different field types which map to different column types commonly supported by database engines. Conversion between python types and those used in the database is handled transparently, allowing you to use the following in your application:
In order to start using the models, its necessary to create the tables. This is a one-time operation and can be done quickly using the interactive interpreter. We can create a small helper function to accomplish this:
def create_tables():
with database:
database.create_tables([User, Relationship, Message])
Open a python shell in the directory alongside the example app and execute the following:
>>> from app import * >>> create_tables()
NOTE:
Every model has a create_table() classmethod which runs a SQL CREATE TABLE statement in the database. This method will create the table, including all columns, foreign-key constraints, indexes, and sequences. Usually this is something you'll only do once, whenever a new model is added.
Peewee provides a helper method Database.create_tables() which will resolve inter-model dependencies and call create_table() on each model, ensuring the tables are created in order.
NOTE:
Alternatively, you can use the schema migrations extension to alter your database schema using Python.
You may have noticed in the above model code that there is a class defined on the base model named Meta that sets the database attribute. Peewee allows every model to specify which database it uses. There are many Meta options you can specify which control the behavior of your model.
This is a peewee idiom:
DATABASE = 'tweepee.db' # Create a database instance that will manage the connection and # execute queries database = SqliteDatabase(DATABASE) # Create a base-class all our models will inherit, which defines # the database we'll be using. class BaseModel(Model):
class Meta:
database = database
When developing a web application, it's common to open a connection when a request starts, and close it when the response is returned. You should always manage your connections explicitly. For instance, if you are using a connection pool, connections will only be recycled correctly if you call connect() and close().
We will tell flask that during the request/response cycle we need to create a connection to the database. Flask provides some handy decorators to make this a snap:
@app.before_request def before_request():
database.connect() @app.after_request def after_request(response):
database.close()
return response
NOTE:
In the User model there are a few instance methods that encapsulate some user-specific functionality:
These methods are similar in their implementation but with an important difference in the SQL JOIN and WHERE clauses:
def following(self):
# query other users through the "relationship" table
return (User
.select()
.join(Relationship, on=Relationship.to_user)
.where(Relationship.from_user == self)
.order_by(User.username)) def followers(self):
return (User
.select()
.join(Relationship, on=Relationship.from_user)
.where(Relationship.to_user == self)
.order_by(User.username))
When a new user wants to join the site we need to make sure the username is available, and if so, create a new User record. Looking at the join() view, we can see that our application attempts to create the User using Model.create(). We defined the User.username field with a unique constraint, so if the username is taken the database will raise an IntegrityError.
try:
with database.atomic():
# Attempt to create the user. If the username is taken, due to the
# unique constraint, the database will raise an IntegrityError.
user = User.create(
username=request.form['username'],
password=md5(request.form['password']).hexdigest(),
email=request.form['email'],
join_date=datetime.datetime.now())
# mark the user as being 'authenticated' by setting the session vars
auth_user(user)
return redirect(url_for('homepage')) except IntegrityError:
flash('That username is already taken')
We will use a similar approach when a user wishes to follow someone. To indicate a following relationship, we create a row in the Relationship table pointing from one user to another. Due to the unique index on from_user and to_user, we will be sure not to end up with duplicate rows:
user = get_object_or_404(User, username=username) try:
with database.atomic():
Relationship.create(
from_user=get_current_user(),
to_user=user) except IntegrityError:
pass
If you are logged-in and visit the twitter homepage, you will see tweets from the users that you follow. In order to implement this cleanly, we can use a subquery:
NOTE:
# python code user = get_current_user() messages = (Message
.select()
.where(Message.user.in_(user.following()))
.order_by(Message.pub_date.desc()))
This code corresponds to the following SQL query:
SELECT t1."id", t1."user_id", t1."content", t1."pub_date" FROM "message" AS t1 WHERE t1."user_id" IN (
SELECT t2."id"
FROM "user" AS t2
INNER JOIN "relationship" AS t3
ON t2."id" = t3."to_user_id"
WHERE t3."from_user_id" = ? )
There are a couple other neat things going on in the example app that are worth mentioning briefly.
def object_list(template_name, qr, var_name='object_list', **kwargs):
kwargs.update(
page=int(request.args.get('page', 1)),
pages=qr.count() / 20 + 1)
kwargs[var_name] = qr.paginate(kwargs['page'])
return render_template(template_name, **kwargs)
def auth_user(user):
session['logged_in'] = True
session['user'] = user
session['username'] = user.username
flash('You are logged in as %s' % (user.username)) def login_required(f):
@wraps(f)
def inner(*args, **kwargs):
if not session.get('logged_in'):
return redirect(url_for('login'))
return f(*args, **kwargs)
return inner
def get_object_or_404(model, *expressions):
try:
return model.get(*expressions)
except model.DoesNotExist:
abort(404)
NOTE:
from playhouse.flask_utils import get_object_or_404, object_list
There are more examples included in the peewee examples directory, including:
NOTE:
Peewee contains helpers for working interactively from a Python interpreter or something like a Jupyter notebook. For this example, we'll assume that we have a pre-existing Sqlite database with the following simple schema:
CREATE TABLE IF NOT EXISTS "event" (
"id" INTEGER NOT NULL PRIMARY KEY,
"key" TEXT NOT NULL,
"timestamp" DATETIME NOT NULL,
"metadata" TEXT NOT NULL);
To experiment with querying this database from an interactive interpreter session, we would start our interpreter and import the following helpers:
Our terminal session might look like this:
>>> from peewee import SqliteDatabase >>> from playhouse.reflection import generate_models, print_model, print_table_sql >>>
The generate_models() function will introspect the database and generate model classes for all the tables that are found. This is a handy way to get started and can save a lot of typing. The function returns a dictionary keyed by the table name, with the generated model as the corresponding value:
>>> db = SqliteDatabase('events.db') >>> models = generate_models(db) >>> list(models.items()) [('events', <Model: event>)] >>> globals().update(models) # Inject models into global namespace. >>> event <Model: event>
To take a look at the model definition, which lists the model's fields and data-type, we can use the print_model() function:
>>> print_model(event) event
id AUTO
key TEXT
timestamp DATETIME
metadata TEXT
We can also generate a SQL CREATE TABLE for the introspected model, if you find that easier to read. This should match the actual table definition in the introspected database:
>>> print_table_sql(event) CREATE TABLE IF NOT EXISTS "event" (
"id" INTEGER NOT NULL PRIMARY KEY,
"key" TEXT NOT NULL,
"timestamp" DATETIME NOT NULL,
"metadata" TEXT NOT NULL)
Now that we are familiar with the structure of the table we're working with, we can run some queries on the generated event model:
>>> for e in event.select().order_by(event.timestamp).limit(5): ... print(e.key, e.timestamp) ... e00 2019-01-01 00:01:00 e01 2019-01-01 00:02:00 e02 2019-01-01 00:03:00 e03 2019-01-01 00:04:00 e04 2019-01-01 00:05:00 >>> event.select(fn.MIN(event.timestamp), fn.MAX(event.timestamp)).scalar(as_tuple=True) (datetime.datetime(2019, 1, 1, 0, 1), datetime.datetime(2019, 1, 1, 1, 0)) >>> event.select().count() # Or, len(event) 60
For more information about these APIs and other similar reflection utilities, see the reflection section of the playhouse extensions document.
To generate an actual Python module containing model definitions for an existing database, you can use the command-line pwiz tool. Here is a quick example:
$ pwiz -e sqlite events.db > events.py
The events.py file will now be an import-able module containing a database instance (referencing the events.db) along with model definitions for any tables found in the database. pwiz does some additional nice things like introspecting indexes and adding proper flags for NULL/NOT NULL constraints, etc.
The APIs discussed in this section:
More low-level APIs are also available on the Database instance:
In order to continually improve, Peewee needs the help of developers like you. Whether it's contributing patches, submitting bug reports, or just asking and answering questions, you are helping to make Peewee a better library.
In this document I'll describe some of the ways you can help.
Do you have an idea for a new feature, or is there a clunky API you'd like to improve? Before coding it up and submitting a pull-request, open a new issue on GitHub describing your proposed changes. This doesn't have to be anything formal, just a description of what you'd like to do and why.
When you're ready, you can submit a pull-request with your changes. Successful patches will have the following:
If you've found a bug, please check to see if it has already been reported, and if not create an issue on GitHub. The more information you include, the more quickly the bug will get fixed, so please try to include the following:
If you have found a bug in the code and submit a failing test-case, then hats-off to you, you are a hero!
If you have questions about how to do something with peewee, then I recommend either:
The Peewee Database object represents a connection to a database. The Database class is instantiated with all the information needed to open a connection to a database, and then can be used to:
Peewee comes with support for SQLite, MySQL and Postgres. Each database class provides some basic, database-specific configuration options.
from peewee import * # SQLite database using WAL journal mode and 64MB cache. sqlite_db = SqliteDatabase('/path/to/app.db', pragmas={
'journal_mode': 'wal',
'cache_size': -1024 * 64}) # Connect to a MySQL database on network. mysql_db = MySQLDatabase('my_app', user='app', password='db_password',
host='10.1.0.8', port=3306) # Connect to a Postgres database. pg_db = PostgresqlDatabase('my_app', user='postgres', password='secret',
host='10.1.0.9', port=5432)
Peewee provides advanced support for SQLite, Postgres and CockroachDB via database-specific extension modules. To use the extended-functionality, import the appropriate database-specific module and use the database class provided:
from playhouse.sqlite_ext import SqliteExtDatabase # Use SQLite (will register a REGEXP function and set busy timeout to 3s). db = SqliteExtDatabase('/path/to/app.db', regexp_function=True, timeout=3,
pragmas={'journal_mode': 'wal'}) from playhouse.postgres_ext import PostgresqlExtDatabase # Use Postgres (and register hstore extension). db = PostgresqlExtDatabase('my_app', user='postgres', register_hstore=True) from playhouse.cockroachdb import CockroachDatabase # Use CockroachDB. db = CockroachDatabase('my_app', user='root', port=26257, host='10.1.0.8')
For more information on database extensions, see:
The Database initialization method expects the name of the database as the first parameter. Subsequent keyword arguments are passed to the underlying database driver when establishing the connection, allowing you to pass vendor-specific parameters easily.
For instance, with Postgresql it is common to need to specify the host, user and password when creating your connection. These are not standard Peewee Database parameters, so they will be passed directly back to psycopg2 when creating connections:
db = PostgresqlDatabase(
'database_name', # Required by Peewee.
user='postgres', # Will be passed directly to psycopg2.
password='secret', # Ditto.
host='db.mysite.com') # Ditto.
As another example, the pymysql driver accepts a charset parameter which is not a standard Peewee Database parameter. To set this value, simply pass in charset alongside your other values:
db = MySQLDatabase('database_name', user='www-data', charset='utf8mb4')
Consult your database driver's documentation for the available parameters:
To connect to a Postgresql database, we will use PostgresqlDatabase. The first parameter is always the name of the database, and after that you can specify arbitrary psycopg2 parameters.
psql_db = PostgresqlDatabase('my_database', user='postgres') class BaseModel(Model):
"""A base model that will use our Postgresql database"""
class Meta:
database = psql_db class User(BaseModel):
username = CharField()
The playhouse contains a Postgresql extension module which provides many postgres-specific features such as:
If you would like to use these awesome features, use the PostgresqlExtDatabase from the playhouse.postgres_ext module:
from playhouse.postgres_ext import PostgresqlExtDatabase psql_db = PostgresqlExtDatabase('my_database', user='postgres')
As of Peewee 3.9.7, the isolation level can be specified as an initialization parameter, using the symbolic constants in psycopg2.extensions:
from psycopg2.extensions import ISOLATION_LEVEL_SERIALIZABLE db = PostgresqlDatabase('my_app', user='postgres', host='db-host',
isolation_level=ISOLATION_LEVEL_SERIALIZABLE)
NOTE:
db = PostgresqlDatabase(...) conn = db.connection() # returns current connection. from psycopg2.extensions import ISOLATION_LEVEL_SERIALIZABLE conn.set_isolation_level(ISOLATION_LEVEL_SERIALIZABLE)
To run this every time a connection is created, subclass and implement the _initialize_database() hook, which is designed for this purpose:
class SerializedPostgresqlDatabase(PostgresqlDatabase):
def _initialize_connection(self, conn):
conn.set_isolation_level(ISOLATION_LEVEL_SERIALIZABLE)
Connect to CockroachDB (CRDB) using the CockroachDatabase database class, defined in playhouse.cockroachdb:
from playhouse.cockroachdb import CockroachDatabase db = CockroachDatabase('my_app', user='root', port=26257, host='localhost')
CRDB provides client-side transaction retries, which are available using a special CockroachDatabase.run_transaction() helper-method. This method accepts a callable, which is responsible for executing any transactional statements that may need to be retried.
Simplest possible example of run_transaction():
def create_user(email):
# Callable that accepts a single argument (the database instance) and
# which is responsible for executing the transactional SQL.
def callback(db_ref):
return User.create(email=email)
return db.run_transaction(callback, max_attempts=10) huey = create_user('huey@example.com')
NOTE:
For more information, see:
To connect to a SQLite database, we will use SqliteDatabase. The first parameter is the filename containing the database, or the string ':memory:' to create an in-memory database. After the database filename, you can specify a list or pragmas or any other arbitrary sqlite3 parameters.
sqlite_db = SqliteDatabase('my_app.db', pragmas={'journal_mode': 'wal'}) class BaseModel(Model):
"""A base model that will use our Sqlite database."""
class Meta:
database = sqlite_db class User(BaseModel):
username = TextField()
# etc, etc
Peewee includes a SQLite extension module which provides many SQLite-specific features such as full-text search, json extension support, and much, much more. If you would like to use these awesome features, use the SqliteExtDatabase from the playhouse.sqlite_ext module:
from playhouse.sqlite_ext import SqliteExtDatabase sqlite_db = SqliteExtDatabase('my_app.db', pragmas={
'journal_mode': 'wal', # WAL-mode.
'cache_size': -64 * 1000, # 64MB cache.
'synchronous': 0}) # Let the OS manage syncing.
SQLite allows run-time configuration of a number of parameters through PRAGMA statements (SQLite documentation). These statements are typically run when a new database connection is created. To run one or more PRAGMA statements against new connections, you can specify them as a dictionary or a list of 2-tuples containing the pragma name and value:
db = SqliteDatabase('my_app.db', pragmas={
'journal_mode': 'wal',
'cache_size': 10000, # 10000 pages, or ~40MB
'foreign_keys': 1, # Enforce foreign-key constraints })
PRAGMAs may also be configured dynamically using either the pragma() method or the special properties exposed on the SqliteDatabase object:
# Set cache size to 64MB for *current connection*. db.pragma('cache_size', -1024 * 64) # Same as above. db.cache_size = -1024 * 64 # Read the value of several pragmas: print('cache_size:', db.cache_size) print('foreign_keys:', db.foreign_keys) print('journal_mode:', db.journal_mode) print('page_size:', db.page_size) # Set foreign_keys pragma on current connection *AND* on all # connections opened subsequently. db.pragma('foreign_keys', 1, permanent=True)
ATTENTION:
NOTE:
The following settings are what I use with SQLite for a typical web application database.
pragma | recommended setting | explanation |
journal_mode | wal | allow readers and writers to co-exist |
cache_size | -1 * data_size_kb | set page-cache size in KiB, e.g. -32000 = 32MB |
foreign_keys | 1 | enforce foreign-key constraints |
ignore_check_constraints | 0 | enforce CHECK constraints |
synchronous | 0 | let OS handle fsync (use with caution) |
Example database using the above options:
db = SqliteDatabase('my_app.db', pragmas={
'journal_mode': 'wal',
'cache_size': -1 * 64000, # 64MB
'foreign_keys': 1,
'ignore_check_constraints': 0,
'synchronous': 0})
SQLite can be extended with user-defined Python code. The SqliteDatabase class supports three types of user-defined extensions:
NOTE:
Example user-defined function:
db = SqliteDatabase('analytics.db') from urllib.parse import urlparse @db.func('hostname') def hostname(url):
if url is not None:
return urlparse(url).netloc # Call this function in our code: # The following finds the most common hostnames of referrers by count: query = (PageView
.select(fn.hostname(PageView.referrer), fn.COUNT(PageView.id))
.group_by(fn.hostname(PageView.referrer))
.order_by(fn.COUNT(PageView.id).desc()))
Example user-defined aggregate:
from hashlib import md5 @db.aggregate('md5') class MD5Checksum(object):
def __init__(self):
self.checksum = md5()
def step(self, value):
self.checksum.update(value.encode('utf-8'))
def finalize(self):
return self.checksum.hexdigest() # Usage: # The following computes an aggregate MD5 checksum for files broken # up into chunks and stored in the database. query = (FileChunk
.select(FileChunk.filename, fn.MD5(FileChunk.data))
.group_by(FileChunk.filename)
.order_by(FileChunk.filename, FileChunk.sequence))
Example collation:
@db.collation('ireverse') def collate_reverse(s1, s2):
# Case-insensitive reverse.
s1, s2 = s1.lower(), s2.lower()
return (s1 < s2) - (s1 > s2) # Equivalent to -cmp(s1, s2) # To use this collation to sort books in reverse order... Book.select().order_by(collate_reverse.collation(Book.title)) # Or... Book.select().order_by(Book.title.asc(collation='reverse'))
Example user-defined table-value function (see TableFunction and table_function) for additional details:
from playhouse.sqlite_ext import TableFunction db = SqliteDatabase('my_app.db') @db.table_function('series') class Series(TableFunction):
columns = ['value']
params = ['start', 'stop', 'step']
def initialize(self, start=0, stop=None, step=1):
"""
Table-functions declare an initialize() method, which is
called with whatever arguments the user has called the
function with.
"""
self.start = self.current = start
self.stop = stop or float('Inf')
self.step = step
def iterate(self, idx):
"""
Iterate is called repeatedly by the SQLite database engine
until the required number of rows has been read **or** the
function raises a `StopIteration` signalling no more rows
are available.
"""
if self.current > self.stop:
raise StopIteration
ret, self.current = self.current, self.current + self.step
return (ret,) # Usage: cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2)) for value, in cursor:
print(value) # Prints: # 0 # 2 # 4
For more information, see:
SQLite transactions can be opened in three different modes:
Example specifying the locking mode:
db = SqliteDatabase('app.db') with db.atomic('EXCLUSIVE'):
do_something() @db.atomic('IMMEDIATE') def some_other_function():
# This function is wrapped in an "IMMEDIATE" transaction.
do_something_else()
For more information, see the SQLite locking documentation. To learn more about transactions in Peewee, see the Managing Transactions documentation.
Peewee also comes with an alternate SQLite database that uses apsw, an advanced Python SQLite driver. More information on APSW can be obtained on the APSW project website. APSW provides special features like:
If you would like to use APSW, use the APSWDatabase from the apsw_ext module:
from playhouse.apsw_ext import APSWDatabase apsw_db = APSWDatabase('my_app.db')
To connect to a MySQL database, we will use MySQLDatabase. After the database name, you can specify arbitrary connection parameters that will be passed back to the driver (either MySQLdb or pymysql).
mysql_db = MySQLDatabase('my_database') class BaseModel(Model):
"""A base model that will use our MySQL database"""
class Meta:
database = mysql_db class User(BaseModel):
username = CharField()
# etc, etc
This particular error can occur when MySQL kills an idle database connection. This typically happens with web apps that do not explicitly manage database connections. What happens is your application starts, a connection is opened to handle the first query that executes, and, since that connection is never closed, it remains open, waiting for more queries.
To fix this, make sure you are explicitly connecting to the database when you need to execute queries, and close your connection when you are done. In a web-application, this typically means you will open a connection when a request comes in, and close the connection when you return a response.
See the Framework Integration section for examples of configuring common web frameworks to manage database connections.
The playhouse module db_url provides a helper connect() function that accepts a database URL and returns a Database instance.
Example code:
import os from peewee import * from playhouse.db_url import connect # Connect to the database URL defined in the environment, falling # back to a local Sqlite database if no database URL is specified. db = connect(os.environ.get('DATABASE') or 'sqlite:///default.db') class BaseModel(Model):
class Meta:
database = db
Example database URLs:
Sometimes the database connection settings are not known until run-time, when these values may be loaded from a configuration file or the environment. In these cases, you can defer the initialization of the database by specifying None as the database_name.
database = PostgresqlDatabase(None) # Un-initialized database. class SomeModel(Model):
class Meta:
database = database
If you try to connect or issue any queries while your database is uninitialized you will get an exception:
>>> database.connect() Exception: Error, database not properly initialized before opening connection
To initialize your database, call the init() method with the database name and any additional keyword arguments:
database_name = input('What is the name of the db? ') database.init(database_name, host='localhost', user='postgres')
For even more control over initializing your database, see the next section, Dynamically defining a database.
For even more control over how your database is defined/initialized, you can use the DatabaseProxy helper. DatabaseProxy objects act as a placeholder, and then at run-time you can swap it out for a different object. In the example below, we will swap out the database depending on how the app is configured:
database_proxy = DatabaseProxy() # Create a proxy for our db. class BaseModel(Model):
class Meta:
database = database_proxy # Use proxy for our DB. class User(BaseModel):
username = CharField() # Based on configuration, use a different database. if app.config['DEBUG']:
database = SqliteDatabase('local.db') elif app.config['TESTING']:
database = SqliteDatabase(':memory:') else:
database = PostgresqlDatabase('mega_production_db') # Configure our proxy to use the db we specified in config. database_proxy.initialize(database)
WARNING:
However, if it is only connection values that vary at run-time, such as the path to the database file, or the database host, you should instead use Database.init(). See Run-time database configuration for more details.
NOTE:
We have seen three ways that databases can be configured with Peewee:
# The usual way: db = SqliteDatabase('my_app.db', pragmas={'journal_mode': 'wal'}) # Specify the details at run-time: db = SqliteDatabase(None) ... db.init(db_filename, pragmas={'journal_mode': 'wal'}) # Or use a placeholder: db = DatabaseProxy() ... db.initialize(SqliteDatabase('my_app.db', pragmas={'journal_mode': 'wal'}))
Peewee can also set or change the database for your model classes. This technique is used by the Peewee test suite to bind test model classes to various database instances when running the tests.
There are two sets of complementary methods:
As an example, we'll declare two models without specifying any database:
class User(Model):
username = TextField() class Tweet(Model):
user = ForeignKeyField(User, backref='tweets')
content = TextField()
timestamp = TimestampField()
Bind the models to a database at run-time:
postgres_db = PostgresqlDatabase('my_app', user='postgres') sqlite_db = SqliteDatabase('my_app.db') # At this point, the User and Tweet models are NOT bound to any database. # Let's bind them to the Postgres database: postgres_db.bind([User, Tweet]) # Now we will temporarily bind them to the sqlite database: with sqlite_db.bind_ctx([User, Tweet]):
# User and Tweet are now bound to the sqlite database.
assert User._meta.database is sqlite_db # User and Tweet are once again bound to the Postgres database. assert User._meta.database is postgres_db
The Model.bind() and Model.bind_ctx() methods work the same for binding a given model class:
# Bind the user model to the sqlite db. By default, Peewee will also # bind any models that are related to User via foreign-key as well. User.bind(sqlite_db) assert User._meta.database is sqlite_db assert Tweet._meta.database is sqlite_db # Related models bound too. # Here we will temporarily bind *just* the User model to the postgres db. with User.bind_ctx(postgres_db, bind_backrefs=False):
assert User._meta.database is postgres_db
assert Tweet._meta.database is sqlite_db # Has not changed. # And now User is back to being bound to the sqlite_db. assert User._meta.database is sqlite_db
The Testing Peewee Applications section of this document also contains some examples of using the bind() methods.
To open a connection to a database, use the Database.connect() method:
>>> db = SqliteDatabase(':memory:') # In-memory SQLite database. >>> db.connect() True
If we try to call connect() on an already-open database, we get a OperationalError:
>>> db.connect() Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/charles/pypath/peewee.py", line 2390, in connect
raise OperationalError('Connection already opened.') peewee.OperationalError: Connection already opened.
To prevent this exception from being raised, we can call connect() with an additional argument, reuse_if_open:
>>> db.close() # Close connection. True >>> db.connect() True >>> db.connect(reuse_if_open=True) False
Note that the call to connect() returns False if the database connection was already open.
To close a connection, use the Database.close() method:
>>> db.close() True
Calling close() on an already-closed connection will not result in an exception, but will return False:
>>> db.connect() # Open connection. True >>> db.close() # Close connection. True >>> db.close() # Connection already closed, returns False. False
You can test whether the database is closed using the Database.is_closed() method:
>>> db.is_closed() True
It is not necessary to explicitly connect to the database before using it if the database is initialized with autoconnect=True (the default). Managing connections explicitly is considered a best practice, therefore you may consider disabling the autoconnect behavior.
It is very helpful to be explicit about your connection lifetimes. If the connection fails, for instance, the exception will be caught when the connection is being opened, rather than some arbitrary time later when a query is executed. Furthermore, if using a connection pool, it is necessary to call connect() and close() to ensure connections are recycled properly.
For the best guarantee of correctness, disable autoconnect:
db = PostgresqlDatabase('my_app', user='postgres', autoconnect=False)
Peewee keeps track of the connection state using thread-local storage, making the Peewee Database object safe to use with multiple threads. Each thread will have it's own connection, and as a result any given thread will only have a single connection open at a given time.
The database object itself can be used as a context-manager, which opens a connection for the duration of the wrapped block of code. Additionally, a transaction is opened at the start of the wrapped block and committed before the connection is closed (unless an error occurs, in which case the transaction is rolled back).
>>> db.is_closed() True >>> with db: ... print(db.is_closed()) # db is open inside context manager. ... False >>> db.is_closed() # db is closed. True
If you want to manage transactions separately, you can use the Database.connection_context() context manager.
>>> with db.connection_context(): ... # db connection is open. ... pass ... >>> db.is_closed() # db connection is closed. True
The connection_context() method can also be used as a decorator:
@db.connection_context() def prepare_database():
# DB connection will be managed by the decorator, which opens
# a connection, calls function, and closes upon returning.
db.create_tables(MODELS) # Create schema.
load_fixture_data(db)
To obtain a reference to the underlying DB-API 2.0 connection, use the Database.connection() method. This method will return the currently-open connection object, if one exists, otherwise it will open a new connection.
>>> db.connection() <sqlite3.Connection object at 0x7f94e9362f10>
Connection pooling is provided by the pool module, included in the playhouse extensions library. The pool supports:
from playhouse.pool import PooledPostgresqlExtDatabase db = PooledPostgresqlExtDatabase(
'my_database',
max_connections=8,
stale_timeout=300,
user='postgres') class BaseModel(Model):
class Meta:
database = db
The following pooled database classes are available:
For an in-depth discussion of peewee's connection pool, see the pool section of the playhouse documentation.
When writing tests for an application that uses Peewee, it may be desirable to use a special database for tests. Another common practice is to run tests against a clean database, which means ensuring tables are empty at the start of each test.
To bind your models to a database at run-time, you can use the following methods:
Depending on your use-case, one of these options may make more sense. For the examples below, I will use Model.bind().
Example test-case setup:
# tests.py import unittest from my_app.models import EventLog, Relationship, Tweet, User MODELS = [User, Tweet, EventLog, Relationship] # use an in-memory SQLite for tests. test_db = SqliteDatabase(':memory:') class BaseTestCase(unittest.TestCase):
def setUp(self):
# Bind model classes to test db. Since we have a complete list of
# all models, we do not need to recursively bind dependencies.
test_db.bind(MODELS, bind_refs=False, bind_backrefs=False)
test_db.connect()
test_db.create_tables(MODELS)
def tearDown(self):
# Not strictly necessary since SQLite in-memory databases only live
# for the duration of the connection, and in the next step we close
# the connection...but a good practice all the same.
test_db.drop_tables(MODELS)
# Close connection to db.
test_db.close()
# If we wanted, we could re-bind the models to their original
# database here. But for tests this is probably not necessary.
As an aside, and speaking from experience, I recommend testing your application using the same database backend you use in production, so as to avoid any potential compatibility issues.
If you'd like to see some more examples of how to run tests using Peewee, check out Peewee's own test-suite.
gevent is recommended for doing asynchronous I/O with Postgresql or MySQL. Reasons I prefer gevent:
Besides monkey-patching socket, no special steps are required if you are using MySQL with a pure Python driver like pymysql or are using mysql-connector in pure-python mode. MySQL drivers written in C will require special configuration which is beyond the scope of this document.
For Postgres and psycopg2, which is a C extension, you can use the following code snippet to register event hooks that will make your connection async:
from gevent.socket import wait_read, wait_write from psycopg2 import extensions # Call this function after monkey-patching socket (etc). def patch_psycopg2():
extensions.set_wait_callback(_psycopg2_gevent_callback) def _psycopg2_gevent_callback(conn, timeout=None):
while True:
state = conn.poll()
if state == extensions.POLL_OK:
break
elif state == extensions.POLL_READ:
wait_read(conn.fileno(), timeout=timeout)
elif state == extensions.POLL_WRITE:
wait_write(conn.fileno(), timeout=timeout)
else:
raise ValueError('poll() returned unexpected result')
SQLite, because it is embedded in the Python application itself, does not do any socket operations that would be a candidate for non-blocking. Async has no effect one way or the other on SQLite databases.
For web applications, it is common to open a connection when a request is received, and to close the connection when the response is delivered. In this section I will describe how to add hooks to your web app to ensure the database connection is handled properly.
These steps will ensure that regardless of whether you're using a simple SQLite database, or a pool of multiple Postgres connections, peewee will handle the connections correctly.
NOTE:
Flask and peewee are a great combo and my go-to for projects of any size. Flask provides two hooks which we will use to open and close our db connection. We'll open the connection when a request is received, then close it when the response is returned.
from flask import Flask from peewee import * database = SqliteDatabase('my_app.db') app = Flask(__name__) # This hook ensures that a connection is opened to handle any queries # generated by the request. @app.before_request def _db_connect():
database.connect() # This hook ensures that the connection is closed when we've finished # processing the request. @app.teardown_request def _db_close(exc):
if not database.is_closed():
database.close()
While it's less common to see peewee used with Django, it is actually very easy to use the two. To manage your peewee database connections with Django, the easiest way in my opinion is to add a middleware to your app. The middleware should be the very first in the list of middlewares, to ensure it runs first when a request is handled, and last when the response is returned.
If you have a django project named my_blog and your peewee database is defined in the module my_blog.db, you might add the following middleware class:
# middleware.py from my_blog.db import database # Import the peewee database instance. def PeeweeConnectionMiddleware(get_response):
def middleware(request):
database.connect()
try:
response = get_response(request)
finally:
if not database.is_closed():
database.close()
return response
return middleware # Older Django < 1.10 middleware. class PeeweeConnectionMiddleware(object):
def process_request(self, request):
database.connect()
def process_response(self, request, response):
if not database.is_closed():
database.close()
return response
To ensure this middleware gets executed, add it to your settings module:
# settings.py MIDDLEWARE_CLASSES = (
# Our custom middleware appears first in the list.
'my_blog.middleware.PeeweeConnectionMiddleware',
# These are the default Django 1.7 middlewares. Yours may differ,
# but the important this is that our Peewee middleware comes first.
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware', ) # ... other Django settings ...
I haven't used bottle myself, but looking at the documentation I believe the following code should ensure the database connections are properly managed:
# app.py from bottle import hook #, route, etc, etc. from peewee import * db = SqliteDatabase('my-bottle-app.db') @hook('before_request') def _connect_db():
db.connect() @hook('after_request') def _close_db():
if not db.is_closed():
db.close() # Rest of your bottle app goes here.
See the documentation for application processors.
db = SqliteDatabase('my_webpy_app.db') def connection_processor(handler):
db.connect()
try:
return handler()
finally:
if not db.is_closed():
db.close() app.add_processor(connection_processor)
It looks like Tornado's RequestHandler class implements two hooks which can be used to open and close connections when a request is handled.
from tornado.web import RequestHandler db = SqliteDatabase('my_db.db') class PeeweeRequestHandler(RequestHandler):
def prepare(self):
db.connect()
return super(PeeweeRequestHandler, self).prepare()
def on_finish(self):
if not db.is_closed():
db.close()
return super(PeeweeRequestHandler, self).on_finish()
In your app, instead of extending the default RequestHandler, now you can extend PeeweeRequestHandler.
Note that this does not address how to use peewee asynchronously with Tornado or another event loop.
The connection handling code can be placed in a middleware.
def peewee_middleware(request, following):
db.connect()
try:
response = following(request)
finally:
if not db.is_closed():
db.close()
return response app = WSGIApplication(middleware=[
lambda x: peewee_middleware,
# ... other middlewares ... ])
Thanks to GitHub user @tuukkamustonen for submitting this code.
The connection handling code can be placed in a middleware component.
import falcon from peewee import * database = SqliteDatabase('my_app.db') class PeeweeConnectionMiddleware(object):
def process_request(self, req, resp):
database.connect()
def process_response(self, req, resp, resource, req_succeeded):
if not database.is_closed():
database.close() application = falcon.API(middleware=[
PeeweeConnectionMiddleware(),
# ... other middlewares ... ])
Set up a Request factory that handles database connection lifetime as follows:
from pyramid.request import Request db = SqliteDatabase('pyramidapp.db') class MyRequest(Request):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
db.connect()
self.add_finished_callback(self.finish)
def finish(self, request):
if not db.is_closed():
db.close()
In your application main() make sure MyRequest is used as request_factory:
def main(global_settings, **settings):
config = Configurator(settings=settings, ...)
config.set_request_factory(MyRequest)
See Publish/Subscribe pattern.
def _db_connect():
db.connect() def _db_close():
if not db.is_closed():
db.close() cherrypy.engine.subscribe('before_request', _db_connect) cherrypy.engine.subscribe('after_request', _db_close)
In Sanic, the connection handling code can be placed in the request and response middleware sanic middleware.
# app.py @app.middleware('request') async def handle_request(request):
db.connect() @app.middleware('response') async def handle_response(request, response):
if not db.is_closed():
db.close()
Similar to Flask, FastAPI provides two event based hooks which we will use to open and close our db connection. We'll open the connection when a request is received, then close it when the response is returned.
from fastapi import FastAPI from peewee import * db = SqliteDatabase('my_app.db') app = FastAPI() # This hook ensures that a connection is opened to handle any queries # generated by the request. @app.on_event("startup") def startup():
db.connect() # This hook ensures that the connection is closed when we've finished # processing the request. @app.on_event("shutdown") def shutdown():
if not db.is_closed():
db.close()
Don't see your framework here? Please open a GitHub ticket and I'll see about adding a section, or better yet, submit a documentation pull-request.
SQL queries will typically be executed by calling execute() on a query constructed using the query-builder APIs (or by simply iterating over a query object in the case of a Select query). For cases where you wish to execute SQL directly, you can use the Database.execute_sql() method.
db = SqliteDatabase('my_app.db') db.connect() # Example of executing a simple query and ignoring the results. db.execute_sql("ATTACH DATABASE ':memory:' AS cache;") # Example of iterating over the results of a query using the cursor. cursor = db.execute_sql('SELECT * FROM users WHERE status = ?', (ACTIVE,)) for row in cursor.fetchall():
# Do something with row, which is a tuple containing column data.
pass
Peewee provides several interfaces for working with transactions. The most general is the Database.atomic() method, which also supports nested transactions. atomic() blocks will be run in a transaction or savepoint, depending on the level of nesting.
If an exception occurs in a wrapped block, the current transaction/savepoint will be rolled back. Otherwise the statements will be committed at the end of the wrapped block.
NOTE:
with db.atomic() as transaction: # Opens new transaction.
try:
save_some_objects()
except ErrorSavingData:
# Because this block of code is wrapped with "atomic", a
# new transaction will begin automatically after the call
# to rollback().
transaction.rollback()
error_saving = True
create_report(error_saving=error_saving)
# Note: no need to call commit. Since this marks the end of the
# wrapped block of code, the `atomic` context manager will
# automatically call commit for us.
NOTE:
Using atomic as context manager:
db = SqliteDatabase(':memory:') with db.atomic() as txn:
# This is the outer-most level, so this block corresponds to
# a transaction.
User.create(username='charlie')
with db.atomic() as nested_txn:
# This block corresponds to a savepoint.
User.create(username='huey')
# This will roll back the above create() query.
nested_txn.rollback()
User.create(username='mickey') # When the block ends, the transaction is committed (assuming no error # occurs). At that point there will be two users, "charlie" and "mickey".
You can use the atomic method to perform get or create operations as well:
try:
with db.atomic():
user = User.create(username=username)
return 'Success' except peewee.IntegrityError:
return 'Failure: %s is already in use.' % username
Using atomic as a decorator:
@db.atomic() def create_user(username):
# This statement will run in a transaction. If the caller is already
# running in an `atomic` block, then a savepoint will be used instead.
return User.create(username=username) create_user('charlie')
atomic() provides transparent nesting of transactions. When using atomic(), the outer-most call will be wrapped in a transaction, and any nested calls will use savepoints.
with db.atomic() as txn:
perform_operation()
with db.atomic() as nested_txn:
perform_another_operation()
Peewee supports nested transactions through the use of savepoints (for more information, see savepoint()).
If you wish to explicitly run code in a transaction, you can use transaction(). Like atomic(), transaction() can be used as a context manager or as a decorator.
If an exception occurs in a wrapped block, the transaction will be rolled back. Otherwise the statements will be committed at the end of the wrapped block.
db = SqliteDatabase(':memory:') with db.transaction() as txn:
# Delete the user and their associated tweets.
user.delete_instance(recursive=True)
Transactions can be explicitly committed or rolled-back within the wrapped block. When this happens, a new transaction will be started.
with db.transaction() as txn:
User.create(username='mickey')
txn.commit() # Changes are saved and a new transaction begins.
User.create(username='huey')
# Roll back. "huey" will not be saved, but since "mickey" was already
# committed, that row will remain in the database.
txn.rollback() with db.transaction() as txn:
User.create(username='whiskers')
# Roll back changes, which removes "whiskers".
txn.rollback()
# Create a new row for "mr. whiskers" which will be implicitly committed
# at the end of the `with` block.
User.create(username='mr. whiskers')
NOTE:
Just as you can explicitly create transactions, you can also explicitly create savepoints using the savepoint() method. Savepoints must occur within a transaction, but can be nested arbitrarily deep.
with db.transaction() as txn:
with db.savepoint() as sp:
User.create(username='mickey')
with db.savepoint() as sp2:
User.create(username='zaizee')
sp2.rollback() # "zaizee" will not be saved, but "mickey" will be.
WARNING:
By default, Peewee operates in autocommit mode, such that any statements executed outside of a transaction are run in their own transaction. To group multiple statements into a transaction, Peewee provides the atomic() context-manager/decorator. This should cover all use-cases, but in the unlikely event you want to temporarily disable Peewee's transaction management completely, you can use the Database.manual_commit() context-manager/decorator.
Here is how you might emulate the behavior of the transaction() context manager:
with db.manual_commit():
db.begin() # Have to begin transaction explicitly.
try:
user.delete_instance(recursive=True)
except:
db.rollback() # Rollback! An error occurred.
raise
else:
try:
db.commit() # Commit changes.
except:
db.rollback()
raise
Again -- I don't anticipate anyone needing this, but it's here just in case.
The Python DB-API 2.0 spec describes several types of exceptions. Because most database drivers have their own implementations of these exceptions, Peewee simplifies things by providing its own wrappers around any implementation-specific exception classes. That way, you don't need to worry about importing any special exception classes, you can just use the ones from peewee:
NOTE:
All queries are logged to the peewee namespace using the standard library logging module. Queries are logged using the DEBUG level. If you're interested in doing something with the queries, you can simply register a handler.
# Print all queries to stderr. import logging logger = logging.getLogger('peewee') logger.addHandler(logging.StreamHandler()) logger.setLevel(logging.DEBUG)
Peewee comes with built-in support for Postgres, MySQL and SQLite. These databases are very popular and run the gamut from fast, embeddable databases to heavyweight servers suitable for large-scale deployments. That being said, there are a ton of cool databases out there and adding support for your database-of-choice should be really easy, provided the driver supports the DB-API 2.0 spec.
The DB-API 2.0 spec should be familiar to you if you've used the standard library sqlite3 driver, psycopg2 or the like. Peewee currently relies on a handful of parts:
These methods are generally wrapped up in higher-level abstractions and exposed by the Database, so even if your driver doesn't do these exactly you can still get a lot of mileage out of peewee. An example is the apsw sqlite driver in the "playhouse" module.
The first thing is to provide a subclass of Database that will open a connection.
from peewee import Database import foodb # Our fictional DB-API 2.0 driver. class FooDatabase(Database):
def _connect(self, database, **kwargs):
return foodb.connect(database, **kwargs)
The Database provides a higher-level API and is responsible for executing queries, creating tables and indexes, and introspecting the database to get lists of tables. The above implementation is the absolute minimum needed, though some features will not work -- for best results you will want to additionally add a method for extracting a list of tables and indexes for a table from the database. We'll pretend that FooDB is a lot like MySQL and has special "SHOW" statements:
class FooDatabase(Database):
def _connect(self, database, **kwargs):
return foodb.connect(database, **kwargs)
def get_tables(self):
res = self.execute('SHOW TABLES;')
return [r[0] for r in res.fetchall()]
Other things the database handles that are not covered here include:
Refer to the Database API reference or the source code. for details.
NOTE:
Our new database can be used just like any of the other database subclasses:
from peewee import * from foodb_ext import FooDatabase db = FooDatabase('my_database', user='foo', password='secret') class BaseModel(Model):
class Meta:
database = db class Blog(BaseModel):
title = CharField()
contents = TextField()
pub_date = DateTimeField()
Model classes, Field instances and model instances all map to database concepts:
Thing | Corresponds to... |
Model class | Database table |
Field instance | Column on a table |
Model instance | Row in a database table |
The following code shows the typical way you will define your database connection and model classes.
import datetime from peewee import * db = SqliteDatabase('my_app.db') class BaseModel(Model):
class Meta:
database = db class User(BaseModel):
username = CharField(unique=True) class Tweet(BaseModel):
user = ForeignKeyField(User, backref='tweets')
message = TextField()
created_date = DateTimeField(default=datetime.datetime.now)
is_published = BooleanField(default=True)
db = SqliteDatabase('my_app.db')
The db object will be used to manage the connections to the Sqlite database. In this example we're using SqliteDatabase, but you could also use one of the other database engines.
class BaseModel(Model):
class Meta:
database = db
It is good practice to define a base model class which establishes the database connection. This makes your code DRY as you will not have to specify the database for subsequent models.
Model configuration is kept namespaced in a special class called Meta. This convention is borrowed from Django. Meta configuration is passed on to subclasses, so our project's models will all subclass BaseModel. There are many different attributes you can configure using Model.Meta.
class User(BaseModel):
username = CharField(unique=True)
Model definition uses the declarative style seen in other popular ORMs like SQLAlchemy or Django. Note that we are extending the BaseModel class so the User model will inherit the database connection.
We have explicitly defined a single username column with a unique constraint. Because we have not specified a primary key, peewee will automatically add an auto-incrementing integer primary key field named id.
NOTE:
The Field class is used to describe the mapping of Model attributes to database columns. Each field type has a corresponding SQL storage class (i.e. varchar, int), and conversion between python data types and underlying storage is handled transparently.
When creating a Model class, fields are defined as class attributes. This should look familiar to users of the django framework. Here's an example:
class User(Model):
username = CharField()
join_date = DateTimeField()
about_me = TextField()
In the above example, because none of the fields are initialized with primary_key=True, an auto-incrementing primary key will automatically be created and named "id". Peewee uses AutoField to signify an auto-incrementing integer primary key, which implies primary_key=True.
There is one special type of field, ForeignKeyField, which allows you to represent foreign-key relationships between models in an intuitive way:
class Message(Model):
user = ForeignKeyField(User, backref='messages')
body = TextField()
send_date = DateTimeField(default=datetime.datetime.now)
This allows you to write code like the following:
>>> print(some_message.user.username) Some User >>> for message in some_user.messages: ... print(message.body) some message another message yet another message
NOTE:
For full documentation on fields, see the Fields API notes
Field Type | Sqlite | Postgresql | MySQL |
AutoField | integer | serial | integer |
BigAutoField | integer | bigserial | bigint |
IntegerField | integer | integer | integer |
BigIntegerField | integer | bigint | bigint |
SmallIntegerField | integer | smallint | smallint |
IdentityField | not supported | int identity | not supported |
FloatField | real | real | real |
DoubleField | real | double precision | double precision |
DecimalField | decimal | numeric | numeric |
CharField | varchar | varchar | varchar |
FixedCharField | char | char | char |
TextField | text | text | text |
BlobField | blob | bytea | blob |
BitField | integer | bigint | bigint |
BigBitField | blob | bytea | blob |
UUIDField | text | uuid | varchar(40) |
BinaryUUIDField | blob | bytea | varbinary(16) |
DateTimeField | datetime | timestamp | datetime |
DateField | date | date | date |
TimeField | time | time | time |
TimestampField | integer | integer | integer |
IPField | integer | bigint | bigint |
BooleanField | integer | boolean | bool |
BareField | untyped | not supported | not supported |
ForeignKeyField | integer | integer | integer |
NOTE:
Parameters accepted by all field types and their default values:
Field type | Special Parameters |
CharField | max_length |
FixedCharField | max_length |
DateTimeField | formats |
DateField | formats |
TimeField | formats |
TimestampField | resolution, utc |
DecimalField | max_digits, decimal_places, auto_round, rounding |
ForeignKeyField | model, field, backref, on_delete, on_update, deferrable lazy_load |
BareField | adapt |
NOTE:
To add database (server-side) constraints, use the constraints parameter.
Peewee can provide default values for fields when objects are created. For example to have an IntegerField default to zero rather than NULL, you could declare the field with a default value:
class Message(Model):
context = TextField()
read_count = IntegerField(default=0)
In some instances it may make sense for the default value to be dynamic. A common scenario is using the current date and time. Peewee allows you to specify a function in these cases, whose return value will be used when the object is created. Note we only provide the function, we do not actually call it:
class Message(Model):
context = TextField()
timestamp = DateTimeField(default=datetime.datetime.now)
NOTE:
def house_defaults():
return {'beds': 0, 'baths': 0} class House(Model):
number = TextField()
street = TextField()
attributes = JSONField(default=house_defaults)
The database can also provide the default value for a field. While peewee does not explicitly provide an API for setting a server-side default value, you can use the constraints parameter to specify the server default:
class Message(Model):
context = TextField()
timestamp = DateTimeField(constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')])
NOTE:
ForeignKeyField is a special field type that allows one model to reference another. Typically a foreign key will contain the primary key of the model it relates to (but you can specify a particular column by specifying a field).
Foreign keys allow data to be normalized. In our example models, there is a foreign key from Tweet to User. This means that all the users are stored in their own table, as are the tweets, and the foreign key from tweet to user allows each tweet to point to a particular user object.
NOTE:
In peewee, accessing the value of a ForeignKeyField will return the entire related object, e.g.:
tweets = (Tweet
.select(Tweet, User)
.join(User)
.order_by(Tweet.created_date.desc())) for tweet in tweets:
print(tweet.user.username, tweet.message)
NOTE:
If we did not select the User, though, then an additional query would be issued to fetch the associated User data:
tweets = Tweet.select().order_by(Tweet.created_date.desc()) for tweet in tweets:
# WARNING: an additional query will be issued for EACH tweet
# to fetch the associated User data.
print(tweet.user.username, tweet.message)
Sometimes you only need the associated primary key value from the foreign key column. In this case, Peewee follows the convention established by Django, of allowing you to access the raw foreign key value by appending "_id" to the foreign key field's name:
tweets = Tweet.select() for tweet in tweets:
# Instead of "tweet.user", we will just get the raw ID value stored
# in the column.
print(tweet.user_id, tweet.message)
To prevent accidentally resolving a foreign-key and triggering an additional query, ForeignKeyField supports an initialization parameter lazy_load which, when disabled, behaves like the "_id" attribute. For example:
class Tweet(Model):
# ... same fields, except we declare the user FK to have
# lazy-load disabled:
user = ForeignKeyField(User, backref='tweets', lazy_load=False) for tweet in Tweet.select():
print(tweet.user, tweet.message) # With lazy-load disabled, accessing tweet.user will not perform an extra # query and the user ID value is returned instead. # e.g.: # 1 tweet from user1 # 1 another from user1 # 2 tweet from user2 # However, if we eagerly load the related user object, then the user # foreign key will behave like usual: for tweet in Tweet.select(Tweet, User).join(User):
print(tweet.user.username, tweet.message) # user1 tweet from user1 # user1 another from user1 # user2 tweet from user1
ForeignKeyField allows for a backreferencing property to be bound to the target model. Implicitly, this property will be named classname_set, where classname is the lowercase name of the class, but can be overridden using the parameter backref:
class Message(Model):
from_user = ForeignKeyField(User, backref='outbox')
to_user = ForeignKeyField(User, backref='inbox')
text = TextField() for message in some_user.outbox:
# We are iterating over all Messages whose from_user is some_user.
print(message) for message in some_user.inbox:
# We are iterating over all Messages whose to_user is some_user
print(message)
The three fields devoted to working with dates and times have special properties which allow access to things like the year, month, hour, etc.
DateField has properties for:
TimeField has properties for:
DateTimeField has all of the above.
These properties can be used just like any other expression. Let's say we have an events calendar and want to highlight all the days in the current month that have an event attached:
# Get the current time. now = datetime.datetime.now() # Get days that have events for the current month. Event.select(Event.event_date.day.alias('day')).where(
(Event.event_date.year == now.year) &
(Event.event_date.month == now.month))
NOTE:
The BitField and BigBitField are new as of 3.0.0. The former provides a subclass of IntegerField that is suitable for storing feature toggles as an integer bitmask. The latter is suitable for storing a bitmap for a large data-set, e.g. expressing membership or bitmap-type data.
As an example of using BitField, let's say we have a Post model and we wish to store certain True/False flags about how the post. We could store all these feature toggles in their own BooleanField objects, or we could use BitField instead:
class Post(Model):
content = TextField()
flags = BitField()
is_favorite = flags.flag(1)
is_sticky = flags.flag(2)
is_minimized = flags.flag(4)
is_deleted = flags.flag(8)
Using these flags is quite simple:
>>> p = Post() >>> p.is_sticky = True >>> p.is_minimized = True >>> print(p.flags) # Prints 4 | 2 --> "6" 6 >>> p.is_favorite False >>> p.is_sticky True
We can also use the flags on the Post class to build expressions in queries:
# Generates a WHERE clause that looks like: # WHERE (post.flags & 1 != 0) favorites = Post.select().where(Post.is_favorite) # Query for sticky + favorite posts: sticky_faves = Post.select().where(Post.is_sticky & Post.is_favorite)
Since the BitField is stored in an integer, there is a maximum of 64 flags you can represent (64-bits is common size of integer column). For storing arbitrarily large bitmaps, you can instead use BigBitField, which uses an automatically managed buffer of bytes, stored in a BlobField.
When bulk-updating one or more bits in a BitField, you can use bitwise operators to set or clear one or more bits:
# Set the 4th bit on all Post objects. Post.update(flags=Post.flags | 8).execute() # Clear the 1st and 3rd bits on all Post objects. Post.update(flags=Post.flags & ~(1 | 4)).execute()
For simple operations, the flags provide handy set() and clear() methods for setting or clearing an individual bit:
# Set the "is_deleted" bit on all posts. Post.update(flags=Post.is_deleted.set()).execute() # Clear the "is_deleted" bit on all posts. Post.update(flags=Post.is_deleted.clear()).execute()
Example usage:
class Bitmap(Model):
data = BigBitField() bitmap = Bitmap() # Sets the ith bit, e.g. the 1st bit, the 11th bit, the 63rd, etc. bits_to_set = (1, 11, 63, 31, 55, 48, 100, 99) for bit_idx in bits_to_set:
bitmap.data.set_bit(bit_idx) # We can test whether a bit is set using "is_set": assert bitmap.data.is_set(11) assert not bitmap.data.is_set(12) # We can clear a bit: bitmap.data.clear_bit(11) assert not bitmap.data.is_set(11) # We can also "toggle" a bit. Recall that the 63rd bit was set earlier. assert bitmap.data.toggle_bit(63) is False assert bitmap.data.toggle_bit(63) is True assert bitmap.data.is_set(63)
The BareField class is intended to be used only with SQLite. Since SQLite uses dynamic typing and data-types are not enforced, it can be perfectly fine to declare fields without any data-type. In those cases you can use BareField. It is also common for SQLite virtual tables to use meta-columns or untyped columns, so for those cases as well you may wish to use an untyped field (although for full-text search, you should use SearchField instead!).
BareField accepts a special parameter adapt. This parameter is a function that takes a value coming from the database and converts it into the appropriate Python type. For instance, if you have a virtual table with an un-typed column but you know that it will return int objects, you can specify adapt=int.
Example:
db = SqliteDatabase(':memory:') class Junk(Model):
anything = BareField()
class Meta:
database = db # Store multiple data-types in the Junk.anything column: Junk.create(anything='a string') Junk.create(anything=12345) Junk.create(anything=3.14159)
It is easy to add support for custom field types in peewee. In this example we will create a UUID field for postgresql (which has a native UUID column type).
To add a custom field type you need to first identify what type of column the field data will be stored in. If you just want to add python behavior atop, say, a decimal field (for instance to make a currency field) you would just subclass DecimalField. On the other hand, if the database offers a custom column type you will need to let peewee know. This is controlled by the Field.field_type attribute.
NOTE:
Let's start by defining our UUID field:
class UUIDField(Field):
field_type = 'uuid'
We will store the UUIDs in a native UUID column. Since psycopg2 treats the data as a string by default, we will add two methods to the field to handle:
import uuid class UUIDField(Field):
field_type = 'uuid'
def db_value(self, value):
return value.hex # convert UUID to hex string.
def python_value(self, value):
return uuid.UUID(value) # convert hex string to UUID
This step is optional. By default, the field_type value will be used for the columns data-type in the database schema. If you need to support multiple databases which use different data-types for your field-data, we need to let the database know how to map this uuid label to an actual uuid column type in the database. Specify the overrides in the Database constructor:
# Postgres, we use UUID data-type. db = PostgresqlDatabase('my_db', field_types={'uuid': 'uuid'}) # Sqlite doesn't have a UUID type, so we use text type. db = SqliteDatabase('my_db', field_types={'uuid': 'text'})
That is it! Some fields may support exotic operations, like the postgresql HStore field acts like a key/value store and has custom operators for things like contains and update. You can specify custom operations as well. For example code, check out the source code for the HStoreField, in playhouse.postgres_ext.
Model classes implement a number of class- and instance-methods, for example Model.save() or Model.create(). If you declare a field whose name coincides with a model method, it could cause problems. Consider:
class LogEntry(Model):
event = TextField()
create = TimestampField() # Uh-oh.
update = TimestampField() # Uh-oh.
To avoid this problem while still using the desired column name in the database schema, explicitly specify the column_name while providing an alternative name for the field attribute:
class LogEntry(Model):
event = TextField()
create_ = TimestampField(column_name='create')
update_ = TimestampField(column_name='update')
In order to start using our models, its necessary to open a connection to the database and create the tables first. Peewee will run the necessary CREATE TABLE queries, additionally creating any constraints and indexes.
# Connect to our database. db.connect() # Create the tables. db.create_tables([User, Tweet])
NOTE:
NOTE:
After you have created your tables, if you choose to modify your database schema (by adding, removing or otherwise changing the columns) you will need to either:
In order not to pollute the model namespace, model-specific configuration is placed in a special class called Meta (a convention borrowed from the django framework):
from peewee import * contacts_db = SqliteDatabase('contacts.db') class Person(Model):
name = CharField()
class Meta:
database = contacts_db
This instructs peewee that whenever a query is executed on Person to use the contacts database.
NOTE:
Once the class is defined, you should not access ModelClass.Meta, but instead use ModelClass._meta:
>>> Person.Meta Traceback (most recent call last):
File "<stdin>", line 1, in <module> AttributeError: type object 'Person' has no attribute 'Meta' >>> Person._meta <peewee.ModelOptions object at 0x7f51a2f03790>
The ModelOptions class implements several methods which may be of use for retrieving model metadata (such as lists of fields, foreign key relationships, and more).
>>> Person._meta.fields {'id': <peewee.AutoField object at 0x7f51a2e92750>,
'name': <peewee.CharField object at 0x7f51a2f0a510>} >>> Person._meta.primary_key <peewee.AutoField object at 0x7f51a2e92750> >>> Person._meta.database <peewee.SqliteDatabase object at 0x7f519bff6dd0>
There are several options you can specify as Meta attributes. While most options are inheritable, some are table-specific and will not be inherited by subclasses.
Option | Meaning | Inheritable? |
database | database for model | yes |
table_name | name of the table to store data | no |
table_function | function to generate table name dynamically | yes |
indexes | a list of fields to index | yes |
primary_key | a CompositeKey instance | yes |
constraints | a list of table constraints | yes |
schema | the database schema for the model | yes |
only_save_dirty | when calling model.save(), only save dirty fields | yes |
options | dictionary of options for create table extensions | yes |
table_settings | list of setting strings to go after close parentheses | yes |
temporary | indicate temporary table | yes |
legacy_table_names | use legacy table name generation (enabled by default) | yes |
depends_on | indicate this table depends on another for creation | no |
without_rowid | indicate table should not have rowid (SQLite only) | no |
Here is an example showing inheritable versus non-inheritable attributes:
>>> db = SqliteDatabase(':memory:') >>> class ModelOne(Model): ... class Meta: ... database = db ... table_name = 'model_one_tbl' ... >>> class ModelTwo(ModelOne): ... pass ... >>> ModelOne._meta.database is ModelTwo._meta.database True >>> ModelOne._meta.table_name == ModelTwo._meta.table_name False
The Meta.primary_key attribute is used to specify either a CompositeKey or to indicate that the model has no primary key. Composite primary keys are discussed in more detail here: Composite primary keys.
To indicate that a model should not have a primary key, then set primary_key = False.
Examples:
class BlogToTag(Model):
"""A simple "through" table for many-to-many relationship."""
blog = ForeignKeyField(Blog)
tag = ForeignKeyField(Tag)
class Meta:
primary_key = CompositeKey('blog', 'tag') class NoPrimaryKey(Model):
data = IntegerField()
class Meta:
primary_key = False
By default Peewee will automatically generate a table name based on the name of your model class. The way the table-name is generated depends on the value of Meta.legacy_table_names. By default, legacy_table_names=True so as to avoid breaking backwards-compatibility. However, if you wish to use the new and improved table-name generation, you can specify legacy_table_names=False.
This table shows the differences in how a model name is converted to a SQL table name, depending on the value of legacy_table_names:
Model name | legacy_table_names=True | legacy_table_names=False (new) |
User | user | user |
UserProfile | userprofile | user_profile |
APIResponse | apiresponse | api_response |
WebHTTPRequest | webhttprequest | web_http_request |
mixedCamelCase | mixedcamelcase | mixed_camel_case |
Name2Numbers3XYZ | name2numbers3xyz | name2_numbers3_xyz |
ATTENTION:
In the next major release (Peewee 4.0), legacy_table_names will have a default value of False.
To explicitly specify the table name for a model class, use the table_name Meta option. This feature can be useful for dealing with pre-existing database schemas that may have used awkward naming conventions:
class UserProfile(Model):
class Meta:
table_name = 'user_profile_tbl'
If you wish to implement your own naming convention, you can specify the table_function Meta option. This function will be called with your model class and should return the desired table name as a string. Suppose our company specifies that table names should be lower-cased and end with "_tbl", we can implement this as a table function:
def make_table_name(model_class):
model_name = model_class.__name__
return model_name.lower() + '_tbl' class BaseModel(Model):
class Meta:
table_function = make_table_name class User(BaseModel):
# table_name will be "user_tbl". class UserProfile(BaseModel):
# table_name will be "userprofile_tbl".
Peewee can create indexes on single or multiple columns, optionally including a UNIQUE constraint. Peewee also supports user-defined constraints on both models and fields.
Single column indexes are defined using field initialization parameters. The following example adds a unique index on the username field, and a normal index on the email field:
class User(Model):
username = CharField(unique=True)
email = CharField(index=True)
To add a user-defined constraint on a column, you can pass it in using the constraints parameter. You may wish to specify a default value as part of the schema, or add a CHECK constraint, for example:
class Product(Model):
name = CharField(unique=True)
price = DecimalField(constraints=[Check('price < 10000')])
created = DateTimeField(
constraints=[SQL("DEFAULT (datetime('now'))")])
Multi-column indexes may be defined as Meta attributes using a nested tuple. Each database index is a 2-tuple, the first part of which is a tuple of the names of the fields, the second part a boolean indicating whether the index should be unique.
class Transaction(Model):
from_acct = CharField()
to_acct = CharField()
amount = DecimalField()
date = DateTimeField()
class Meta:
indexes = (
# create a unique on from/to/date
(('from_acct', 'to_acct', 'date'), True),
# create a non-unique on from/to
(('from_acct', 'to_acct'), False),
)
NOTE:
class Meta:
indexes = (
(('first_name', 'last_name'), True), # Note the trailing comma!
)
Peewee supports a more structured API for declaring indexes on a model using the Model.add_index() method or by directly using the ModelIndex helper class.
Examples:
class Article(Model):
name = TextField()
timestamp = TimestampField()
status = IntegerField()
flags = IntegerField() # Add an index on "name" and "timestamp" columns. Article.add_index(Article.name, Article.timestamp) # Add a partial index on name and timestamp where status = 1. Article.add_index(Article.name, Article.timestamp,
where=(Article.status == 1)) # Create a unique index on timestamp desc, status & 4. idx = Article.index(
Article.timestamp.desc(),
Article.flags.bin_and(4),
unique=True) Article.add_index(idx)
WARNING:
# SQLite does not support parameterized CREATE INDEX queries, so # we declare it manually. Article.add_index(SQL('CREATE INDEX ...'))
See add_index() for details.
For more information, see:
Peewee allows you to add arbitrary constraints to your Model, that will be part of the table definition when the schema is created.
For instance, suppose you have a people table with a composite primary key of two columns, the person's first and last name. You wish to have another table relate to the people table, and to do this, you will need to define a foreign key constraint:
class Person(Model):
first = CharField()
last = CharField()
class Meta:
primary_key = CompositeKey('first', 'last') class Pet(Model):
owner_first = CharField()
owner_last = CharField()
pet_name = CharField()
class Meta:
constraints = [SQL('FOREIGN KEY(owner_first, owner_last) '
'REFERENCES person(first, last)')]
You can also implement CHECK constraints at the table level:
class Product(Model):
name = CharField(unique=True)
price = DecimalField()
class Meta:
constraints = [Check('price < 10000')]
The AutoField is used to identify an auto-incrementing integer primary key. If you do not specify a primary key, Peewee will automatically create an auto-incrementing primary key named "id".
To specify an auto-incrementing ID using a different field name, you can write:
class Event(Model):
event_id = AutoField() # Event.event_id will be auto-incrementing PK.
name = CharField()
timestamp = DateTimeField(default=datetime.datetime.now)
metadata = BlobField()
You can identify a different field as the primary key, in which case an "id" column will not be created. In this example we will use a person's email address as the primary key:
class Person(Model):
email = CharField(primary_key=True)
name = TextField()
dob = DateField()
WARNING:
class MyModel(Model):
id = IntegerField(primary_key=True)
Peewee understands the above model declaration as a model with an integer primary key, but the value of that ID is determined by the application. To create an auto-incrementing integer primary key, you would instead write:
class MyModel(Model):
id = AutoField() # primary_key=True is implied.
Composite primary keys can be declared using CompositeKey. Note that doing this may cause issues with ForeignKeyField, as Peewee does not support the concept of a "composite foreign-key". As such, I've found it only advisable to use composite primary keys in a handful of situations, such as trivial many-to-many junction tables:
class Image(Model):
filename = TextField()
mimetype = CharField() class Tag(Model):
label = CharField() class ImageTag(Model): # Many-to-many relationship.
image = ForeignKeyField(Image)
tag = ForeignKeyField(Tag)
class Meta:
primary_key = CompositeKey('image', 'tag')
In the extremely rare case you wish to declare a model with no primary key, you can specify primary_key = False in the model Meta options.
If you would like use a non-integer primary key (which I generally don't recommend), you can specify primary_key=True when creating a field. When you wish to create a new instance for a model using a non-autoincrementing primary key, you need to be sure you save() specifying force_insert=True.
from peewee import * class UUIDModel(Model):
id = UUIDField(primary_key=True)
Auto-incrementing IDs are, as their name says, automatically generated for you when you insert a new row into the database. When you call save(), peewee determines whether to do an INSERT versus an UPDATE based on the presence of a primary key value. Since, with our uuid example, the database driver won't generate a new ID, we need to specify it manually. When we call save() for the first time, pass in force_insert = True:
# This works because .create() will specify `force_insert=True`. obj1 = UUIDModel.create(id=uuid.uuid4()) # This will not work, however. Peewee will attempt to do an update: obj2 = UUIDModel(id=uuid.uuid4()) obj2.save() # WRONG obj2.save(force_insert=True) # CORRECT # Once the object has been created, you can call save() normally. obj2.save()
NOTE:
Peewee has very basic support for composite keys. In order to use a composite key, you must set the primary_key attribute of the model options to a CompositeKey instance:
class BlogToTag(Model):
"""A simple "through" table for many-to-many relationship."""
blog = ForeignKeyField(Blog)
tag = ForeignKeyField(Tag)
class Meta:
primary_key = CompositeKey('blog', 'tag')
WARNING:
Sometimes you do not want the database to automatically generate a value for the primary key, for instance when bulk loading relational data. To handle this on a one-off basis, you can simply tell peewee to turn off auto_increment during the import:
data = load_user_csv() # load up a bunch of data User._meta.auto_increment = False # turn off auto incrementing IDs with db.atomic():
for row in data:
u = User(id=row[0], username=row[1])
u.save(force_insert=True) # <-- force peewee to insert row User._meta.auto_increment = True
Although a better way to accomplish the above, without resorting to hacks, is to use the Model.insert_many() API:
data = load_user_csv() fields = [User.id, User.username] with db.atomic():
User.insert_many(data, fields=fields).execute()
If you always want to have control over the primary key, simply do not use the AutoField field type, but use a normal IntegerField (or other column type):
class User(BaseModel):
id = IntegerField(primary_key=True)
username = CharField() >>> u = User.create(id=999, username='somebody') >>> u.id 999 >>> User.get(User.username == 'somebody').id 999
If you wish to create a model with no primary key, you can specify primary_key = False in the inner Meta class:
class MyData(BaseModel):
timestamp = DateTimeField()
value = IntegerField()
class Meta:
primary_key = False
This will yield the following DDL:
CREATE TABLE "mydata" (
"timestamp" DATETIME NOT NULL,
"value" INTEGER NOT NULL )
WARNING:
When creating a hierarchical structure it is necessary to create a self-referential foreign key which links a child object to its parent. Because the model class is not defined at the time you instantiate the self-referential foreign key, use the special string 'self' to indicate a self-referential foreign key:
class Category(Model):
name = CharField()
parent = ForeignKeyField('self', null=True, backref='children')
As you can see, the foreign key points upward to the parent object and the back-reference is named children.
ATTENTION:
When querying against a model that contains a self-referential foreign key you may sometimes need to perform a self-join. In those cases you can use Model.alias() to create a table reference. Here is how you might query the category and parent model using a self-join:
Parent = Category.alias() GrandParent = Category.alias() query = (Category
.select(Category, Parent)
.join(Parent, on=(Category.parent == Parent.id))
.join(GrandParent, on=(Parent.parent == GrandParent.id))
.where(GrandParent.name == 'some category')
.order_by(Category.name))
Sometimes it happens that you will create a circular dependency between two tables.
NOTE:
Adding circular foreign keys with peewee is a bit tricky because at the time you are defining either foreign key, the model it points to will not have been defined yet, causing a NameError.
class User(Model):
username = CharField()
favorite_tweet = ForeignKeyField(Tweet, null=True) # NameError!! class Tweet(Model):
message = TextField()
user = ForeignKeyField(User, backref='tweets')
One option is to simply use an IntegerField to store the raw ID:
class User(Model):
username = CharField()
favorite_tweet_id = IntegerField(null=True)
By using DeferredForeignKey we can get around the problem and still use a foreign key field:
class User(Model):
username = CharField()
# Tweet has not been defined yet so use the deferred reference.
favorite_tweet = DeferredForeignKey('Tweet', null=True) class Tweet(Model):
message = TextField()
user = ForeignKeyField(User, backref='tweets') # Now that Tweet is defined, "favorite_tweet" has been converted into # a ForeignKeyField. print(User.favorite_tweet) # <ForeignKeyField: "user"."favorite_tweet">
There is one more quirk to watch out for, though. When you call create_table we will again encounter the same issue. For this reason peewee will not automatically create a foreign key constraint for any deferred foreign keys.
To create the tables and the foreign-key constraint, you can use the SchemaManager.create_foreign_key() method to create the constraint after creating the tables:
# Will create the User and Tweet tables, but does *not* create a # foreign-key constraint on User.favorite_tweet. db.create_tables([User, Tweet]) # Create the foreign-key constraint: User._schema.create_foreign_key(User.favorite_tweet)
NOTE:
This section will cover the basic CRUD operations commonly performed on a relational database:
NOTE:
You can use Model.create() to create a new model instance. This method accepts keyword arguments, where the keys correspond to the names of the model's fields. A new instance is returned and a row is added to the table.
>>> User.create(username='Charlie') <__main__.User object at 0x2529350>
This will INSERT a new row into the database. The primary key will automatically be retrieved and stored on the model instance.
Alternatively, you can build up a model instance programmatically and then call save():
>>> user = User(username='Charlie') >>> user.save() # save() returns the number of rows modified. 1 >>> user.id 1 >>> huey = User() >>> huey.username = 'Huey' >>> huey.save() 1 >>> huey.id 2
When a model has a foreign key, you can directly assign a model instance to the foreign key field when creating a new record.
>>> tweet = Tweet.create(user=huey, message='Hello!')
You can also use the value of the related object's primary key:
>>> tweet = Tweet.create(user=2, message='Hello again!')
If you simply wish to insert data and do not need to create a model instance, you can use Model.insert():
>>> User.insert(username='Mickey').execute() 3
After executing the insert query, the primary key of the new row is returned.
NOTE:
There are a couple of ways you can load lots of data quickly. The naive approach is to simply call Model.create() in a loop:
data_source = [
{'field1': 'val1-1', 'field2': 'val1-2'},
{'field1': 'val2-1', 'field2': 'val2-2'},
# ... ] for data_dict in data_source:
MyModel.create(**data_dict)
The above approach is slow for a couple of reasons:
You can get a significant speedup by simply wrapping this in a transaction with atomic().
# This is much faster. with db.atomic():
for data_dict in data_source:
MyModel.create(**data_dict)
The above code still suffers from points 2, 3 and 4. We can get another big boost by using insert_many(). This method accepts a list of tuples or dictionaries, and inserts multiple rows in a single query:
data_source = [
{'field1': 'val1-1', 'field2': 'val1-2'},
{'field1': 'val2-1', 'field2': 'val2-2'},
# ... ] # Fastest way to INSERT multiple rows. MyModel.insert_many(data_source).execute()
The insert_many() method also accepts a list of row-tuples, provided you also specify the corresponding fields:
# We can INSERT tuples as well... data = [('val1-1', 'val1-2'),
('val2-1', 'val2-2'),
('val3-1', 'val3-2')] # But we need to indicate which fields the values correspond to. MyModel.insert_many(data, fields=[MyModel.field1, MyModel.field2]).execute()
It is also a good practice to wrap the bulk insert in a transaction:
# You can, of course, wrap this in a transaction as well: with db.atomic():
MyModel.insert_many(data, fields=fields).execute()
NOTE:
Depending on the number of rows in your data source, you may need to break it up into chunks. SQLite in particular typically has a limit of 999 or 32766 variables-per-query (batch size would then be 999 // row length or 32766 // row length).
You can write a loop to batch your data into chunks (in which case it is strongly recommended you use a transaction):
# Insert rows 100 at a time. with db.atomic():
for idx in range(0, len(data_source), 100):
MyModel.insert_many(data_source[idx:idx+100]).execute()
Peewee comes with a chunked() helper function which you can use for efficiently chunking a generic iterable into a series of batch-sized iterables:
from peewee import chunked # Insert rows 100 at a time. with db.atomic():
for batch in chunked(data_source, 100):
MyModel.insert_many(batch).execute()
The Model.bulk_create() method behaves much like Model.insert_many(), but instead it accepts a list of unsaved model instances to insert, and it optionally accepts a batch-size parameter. To use the bulk_create() API:
# Read list of usernames from a file, for example. with open('user_list.txt') as fh:
# Create a list of unsaved User instances.
users = [User(username=line.strip()) for line in fh.readlines()] # Wrap the operation in a transaction and batch INSERT the users # 100 at a time. with db.atomic():
User.bulk_create(users, batch_size=100)
NOTE:
In addition, Peewee also offers Model.bulk_update(), which can efficiently update one or more columns on a list of models. For example:
# First, create 3 users with usernames u1, u2, u3. u1, u2, u3 = [User.create(username='u%s' % i) for i in (1, 2, 3)] # Now we'll modify the user instances. u1.username = 'u1-x' u2.username = 'u2-y' u3.username = 'u3-z' # Update all three users with a single UPDATE query. User.bulk_update([u1, u2, u3], fields=[User.username])
NOTE:
with database.atomic():
User.bulk_update(list_of_users, fields=['username'], batch_size=50)
Alternatively, you can use the Database.batch_commit() helper to process chunks of rows inside batch-sized transactions. This method also provides a workaround for databases besides Postgresql, when the primary-key of the newly-created rows must be obtained.
# List of row data to insert. row_data = [{'username': 'u1'}, {'username': 'u2'}, ...] # Assume there are 789 items in row_data. The following code will result in # 8 total transactions (7x100 rows + 1x89 rows). for row in db.batch_commit(row_data, 100):
User.create(**row)
If the data you would like to bulk load is stored in another table, you can also create INSERT queries whose source is a SELECT query. Use the Model.insert_from() method:
res = (TweetArchive
.insert_from(
Tweet.select(Tweet.user, Tweet.message),
fields=[TweetArchive.user, TweetArchive.message])
.execute())
The above query is equivalent to the following SQL:
INSERT INTO "tweet_archive" ("user_id", "message") SELECT "user_id", "message" FROM "tweet";
Once a model instance has a primary key, any subsequent call to save() will result in an UPDATE rather than another INSERT. The model's primary key will not change:
>>> user.save() # save() returns the number of rows modified. 1 >>> user.id 1 >>> user.save() >>> user.id 1 >>> huey.save() 1 >>> huey.id 2
If you want to update multiple records, issue an UPDATE query. The following example will update all Tweet objects, marking them as published, if they were created before today. Model.update() accepts keyword arguments where the keys correspond to the model's field names:
>>> today = datetime.today() >>> query = Tweet.update(is_published=True).where(Tweet.creation_date < today) >>> query.execute() # Returns the number of rows that were updated. 4
For more information, see the documentation on Model.update(), Update and Model.bulk_update().
NOTE:
Peewee allows you to perform atomic updates. Let's suppose we need to update some counters. The naive approach would be to write something like this:
>>> for stat in Stat.select().where(Stat.url == request.url): ... stat.counter += 1 ... stat.save()
Do not do this! Not only is this slow, but it is also vulnerable to race conditions if multiple processes are updating the counter at the same time.
Instead, you can update the counters atomically using update():
>>> query = Stat.update(counter=Stat.counter + 1).where(Stat.url == request.url) >>> query.execute()
You can make these update statements as complex as you like. Let's give all our employees a bonus equal to their previous bonus plus 10% of their salary:
>>> query = Employee.update(bonus=(Employee.bonus + (Employee.salary * .1))) >>> query.execute() # Give everyone a bonus!
We can even use a subquery to update the value of a column. Suppose we had a denormalized column on the User model that stored the number of tweets a user had made, and we updated this value periodically. Here is how you might write such a query:
>>> subquery = Tweet.select(fn.COUNT(Tweet.id)).where(Tweet.user == User.id) >>> update = User.update(num_tweets=subquery) >>> update.execute()
Peewee provides support for varying types of upsert functionality. With SQLite prior to 3.24.0 and MySQL, Peewee offers the replace(), which allows you to insert a record or, in the event of a constraint violation, replace the existing record.
Example of using replace() and on_conflict_replace():
class User(Model):
username = TextField(unique=True)
last_login = DateTimeField(null=True) # Insert or update the user. The "last_login" value will be updated # regardless of whether the user existed previously. user_id = (User
.replace(username='the-user', last_login=datetime.now())
.execute()) # This query is equivalent: user_id = (User
.insert(username='the-user', last_login=datetime.now())
.on_conflict_replace()
.execute())
NOTE:
MySQL supports upsert via the ON DUPLICATE KEY UPDATE clause. For example:
class User(Model):
username = TextField(unique=True)
last_login = DateTimeField(null=True)
login_count = IntegerField() # Insert a new user. User.create(username='huey', login_count=0) # Simulate the user logging in. The login count and timestamp will be # either created or updated correctly. now = datetime.now() rowid = (User
.insert(username='huey', last_login=now, login_count=1)
.on_conflict(
preserve=[User.last_login], # Use the value we would have inserted.
update={User.login_count: User.login_count + 1})
.execute())
In the above example, we could safely invoke the upsert query as many times as we wanted. The login count will be incremented atomically, the last login column will be updated, and no duplicate rows will be created.
Postgresql and SQLite (3.24.0 and newer) provide a different syntax that allows for more granular control over which constraint violation should trigger the conflict resolution, and what values should be updated or preserved.
Example of using on_conflict() to perform a Postgresql-style upsert (or SQLite 3.24+):
class User(Model):
username = TextField(unique=True)
last_login = DateTimeField(null=True)
login_count = IntegerField() # Insert a new user. User.create(username='huey', login_count=0) # Simulate the user logging in. The login count and timestamp will be # either created or updated correctly. now = datetime.now() rowid = (User
.insert(username='huey', last_login=now, login_count=1)
.on_conflict(
conflict_target=[User.username], # Which constraint?
preserve=[User.last_login], # Use the value we would have inserted.
update={User.login_count: User.login_count + 1})
.execute())
In the above example, we could safely invoke the upsert query as many times as we wanted. The login count will be incremented atomically, the last login column will be updated, and no duplicate rows will be created.
NOTE:
Here is a more advanced (if contrived) example using the EXCLUDED namespace. The EXCLUDED helper allows us to reference values in the conflicting data. For our example, we'll assume a simple table mapping a unique key (string) to a value (integer):
class KV(Model):
key = CharField(unique=True)
value = IntegerField() # Create one row. KV.create(key='k1', value=1) # Demonstrate usage of EXCLUDED. # Here we will attempt to insert a new value for a given key. If that # key already exists, then we will update its value with the *sum* of its # original value and the value we attempted to insert -- provided that # the new value is larger than the original value. query = (KV.insert(key='k1', value=10)
.on_conflict(conflict_target=[KV.key],
update={KV.value: KV.value + EXCLUDED.value},
where=(EXCLUDED.value > KV.value))) # Executing the above query will result in the following data being # present in the "kv" table: # (key='k1', value=11) query.execute() # If we attempted to execute the query *again*, then nothing would be # updated, as the new value (10) is now less than the value in the # original row (11).
For more information, see Insert.on_conflict() and OnConflict.
To delete a single model instance, you can use the Model.delete_instance() shortcut. delete_instance() will delete the given model instance and can optionally delete any dependent objects recursively (by specifying recursive=True).
>>> user = User.get(User.id == 1) >>> user.delete_instance() # Returns the number of rows deleted. 1 >>> User.get(User.id == 1) UserDoesNotExist: instance matching query does not exist: SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."id" = ? PARAMS: [1]
To delete an arbitrary set of rows, you can issue a DELETE query. The following will delete all Tweet objects that are over one year old:
>>> query = Tweet.delete().where(Tweet.creation_date < one_year_ago) >>> query.execute() # Returns the number of rows deleted. 7
For more information, see the documentation on:
You can use the Model.get() method to retrieve a single instance matching the given query. For primary-key lookups, you can also use the shortcut method Model.get_by_id().
This method is a shortcut that calls Model.select() with the given query, but limits the result set to a single row. Additionally, if no model matches the given query, a DoesNotExist exception will be raised.
>>> User.get(User.id == 1) <__main__.User object at 0x25294d0> >>> User.get_by_id(1) # Same as above. <__main__.User object at 0x252df10> >>> User[1] # Also same as above. <__main__.User object at 0x252dd10> >>> User.get(User.id == 1).username u'Charlie' >>> User.get(User.username == 'Charlie') <__main__.User object at 0x2529410> >>> User.get(User.username == 'nobody') UserDoesNotExist: instance matching query does not exist: SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."username" = ? PARAMS: ['nobody']
For more advanced operations, you can use SelectBase.get(). The following query retrieves the latest tweet from the user named charlie:
>>> (Tweet ... .select() ... .join(User) ... .where(User.username == 'charlie') ... .order_by(Tweet.created_date.desc()) ... .get()) <__main__.Tweet object at 0x2623410>
For more information, see the documentation on:
Peewee has one helper method for performing "get/create" type operations: Model.get_or_create(), which first attempts to retrieve the matching row. Failing that, a new row will be created.
For "create or get" type logic, typically one would rely on a unique constraint or primary key to prevent the creation of duplicate objects. As an example, let's say we wish to implement registering a new user account using the example User model. The User model has a unique constraint on the username field, so we will rely on the database's integrity guarantees to ensure we don't end up with duplicate usernames:
try:
with db.atomic():
return User.create(username=username) except peewee.IntegrityError:
# `username` is a unique column, so this username already exists,
# making it safe to call .get().
return User.get(User.username == username)
You can easily encapsulate this type of logic as a classmethod on your own Model classes.
The above example first attempts at creation, then falls back to retrieval, relying on the database to enforce a unique constraint. If you prefer to attempt to retrieve the record first, you can use get_or_create(). This method is implemented along the same lines as the Django function of the same name. You can use the Django-style keyword argument filters to specify your WHERE conditions. The function returns a 2-tuple containing the instance and a boolean value indicating if the object was created.
Here is how you might implement user account creation using get_or_create():
user, created = User.get_or_create(username=username)
Suppose we have a different model Person and would like to get or create a person object. The only conditions we care about when retrieving the Person are their first and last names, but if we end up needing to create a new record, we will also specify their date-of-birth and favorite color:
person, created = Person.get_or_create(
first_name=first_name,
last_name=last_name,
defaults={'dob': dob, 'favorite_color': 'green'})
Any keyword argument passed to get_or_create() will be used in the get() portion of the logic, except for the defaults dictionary, which will be used to populate values on newly-created instances.
For more details read the documentation for Model.get_or_create().
We can use Model.select() to retrieve rows from the table. When you construct a SELECT query, the database will return any rows that correspond to your query. Peewee allows you to iterate over these rows, as well as use indexing and slicing operations:
>>> query = User.select() >>> [user.username for user in query] ['Charlie', 'Huey', 'Peewee'] >>> query[1] <__main__.User at 0x7f83e80f5550> >>> query[1].username 'Huey' >>> query[:2] [<__main__.User at 0x7f83e80f53a8>, <__main__.User at 0x7f83e80f5550>]
Select queries are smart, in that you can iterate, index and slice the query multiple times but the query is only executed once.
In the following example, we will simply call select() and iterate over the return value, which is an instance of Select. This will return all the rows in the User table:
>>> for user in User.select(): ... print user.username ... Charlie Huey Peewee
NOTE:
When iterating over a model that contains a foreign key, be careful with the way you access values on related models. Accidentally resolving a foreign key or iterating over a back-reference can cause N+1 query behavior.
When you create a foreign key, such as Tweet.user, you can use the backref to create a back-reference (User.tweets). Back-references are exposed as Select instances:
>>> tweet = Tweet.get() >>> tweet.user # Accessing a foreign key returns the related model. <tw.User at 0x7f3ceb017f50> >>> user = User.get() >>> user.tweets # Accessing a back-reference returns a query. <peewee.ModelSelect at 0x7f73db3bafd0>
You can iterate over the user.tweets back-reference just like any other Select:
>>> for tweet in user.tweets: ... print(tweet.message) ... hello world this is fun look at this picture of my food
In addition to returning model instances, Select queries can return dictionaries, tuples and namedtuples. Depending on your use-case, you may find it easier to work with rows as dictionaries, for example:
>>> query = User.select().dicts() >>> for row in query: ... print(row) {'id': 1, 'username': 'Charlie'} {'id': 2, 'username': 'Huey'} {'id': 3, 'username': 'Peewee'}
See namedtuples(), tuples(), dicts() for more information.
By default peewee will cache the rows returned when iterating over a Select query. This is an optimization to allow multiple iterations as well as indexing and slicing without causing additional queries. This caching can be problematic, however, when you plan to iterate over a large number of rows.
To reduce the amount of memory used by peewee when iterating over a query, use the iterator() method. This method allows you to iterate without caching each model returned, using much less memory when iterating over large result sets.
# Let's assume we've got 10 million stat objects to dump to a csv file. stats = Stat.select() # Our imaginary serializer class serializer = CSVSerializer() # Loop over all the stats and serialize. for stat in stats.iterator():
serializer.serialize_object(stat)
For simple queries you can see further speed improvements by returning rows as dictionaries, namedtuples or tuples. The following methods can be used on any Select query to change the result row type:
Don't forget to append the iterator() method call to also reduce memory consumption. For example, the above code might look like:
# Let's assume we've got 10 million stat objects to dump to a csv file. stats = Stat.select() # Our imaginary serializer class serializer = CSVSerializer() # Loop over all the stats (rendered as tuples, without caching) and serialize. for stat_tuple in stats.tuples().iterator():
serializer.serialize_tuple(stat_tuple)
When iterating over a large number of rows that contain columns from multiple tables, peewee will reconstruct the model graph for each row returned. This operation can be slow for complex graphs. For example, if we were selecting a list of tweets along with the username and avatar of the tweet's author, Peewee would have to create two objects for each row (a tweet and a user). In addition to the above row-types, there is a fourth method objects() which will return the rows as model instances, but will not attempt to resolve the model graph.
For example:
query = (Tweet
.select(Tweet, User) # Select tweet and user data.
.join(User)) # Note that the user columns are stored in a separate User instance # accessible at tweet.user: for tweet in query:
print(tweet.user.username, tweet.content) # Using ".objects()" will not create the tweet.user object and assigns all # user attributes to the tweet instance: for tweet in query.objects():
print(tweet.username, tweet.content)
For maximum performance, you can execute queries and then iterate over the results using the underlying database cursor. Database.execute() accepts a query object, executes the query, and returns a DB-API 2.0 Cursor object. The cursor will return the raw row-tuples:
query = Tweet.select(Tweet.content, User.username).join(User) cursor = database.execute(query) for (content, username) in cursor:
print(username, '->', content)
You can filter for particular records using normal python operators. Peewee supports a wide variety of query operators.
>>> user = User.get(User.username == 'Charlie') >>> for tweet in Tweet.select().where(Tweet.user == user, Tweet.is_published == True): ... print(tweet.user.username, '->', tweet.message) ... Charlie -> hello world Charlie -> this is fun >>> for tweet in Tweet.select().where(Tweet.created_date < datetime.datetime(2011, 1, 1)): ... print(tweet.message, tweet.created_date) ... Really old tweet 2010-01-01 00:00:00
You can also filter across joins:
>>> for tweet in Tweet.select().join(User).where(User.username == 'Charlie'): ... print(tweet.message) hello world this is fun look at this picture of my food
If you want to express a complex query, use parentheses and python's bitwise or and and operators:
>>> Tweet.select().join(User).where( ... (User.username == 'Charlie') | ... (User.username == 'Peewee Herman'))
NOTE:
Check out the table of query operations to see what types of queries are possible.
NOTE:
You can also nest queries, for example tweets by users whose username starts with "a":
# get users whose username starts with "a" a_users = User.select().where(fn.Lower(fn.Substr(User.username, 1, 1)) == 'a') # the ".in_()" method signifies an "IN" query a_user_tweets = Tweet.select().where(Tweet.user.in_(a_users))
NOTE:
Get active users:
User.select().where(User.active == True)
Get users who are either staff or superusers:
User.select().where(
(User.is_staff == True) | (User.is_superuser == True))
Get tweets by user named "charlie":
Tweet.select().join(User).where(User.username == 'charlie')
Get tweets by staff or superusers (assumes FK relationship):
Tweet.select().join(User).where(
(User.is_staff == True) | (User.is_superuser == True))
Get tweets by staff or superusers using a subquery:
staff_super = User.select(User.id).where(
(User.is_staff == True) | (User.is_superuser == True)) Tweet.select().where(Tweet.user.in_(staff_super))
To return rows in order, use the order_by() method:
>>> for t in Tweet.select().order_by(Tweet.created_date): ... print(t.pub_date) ... 2010-01-01 00:00:00 2011-06-07 14:08:48 2011-06-07 14:12:57 >>> for t in Tweet.select().order_by(Tweet.created_date.desc()): ... print(t.pub_date) ... 2011-06-07 14:12:57 2011-06-07 14:08:48 2010-01-01 00:00:00
You can also use + and - prefix operators to indicate ordering:
# The following queries are equivalent: Tweet.select().order_by(Tweet.created_date.desc()) Tweet.select().order_by(-Tweet.created_date) # Note the "-" prefix. # Similarly you can use "+" to indicate ascending order, though ascending # is the default when no ordering is otherwise specified. User.select().order_by(+User.username)
You can also order across joins. Assuming you want to order tweets by the username of the author, then by created_date:
query = (Tweet
.select()
.join(User)
.order_by(User.username, Tweet.created_date.desc()))
SELECT t1."id", t1."user_id", t1."message", t1."is_published", t1."created_date" FROM "tweet" AS t1 INNER JOIN "user" AS t2
ON t1."user_id" = t2."id" ORDER BY t2."username", t1."created_date" DESC
When sorting on a calculated value, you can either include the necessary SQL expressions, or reference the alias assigned to the value. Here are two examples illustrating these methods:
# Let's start with our base query. We want to get all usernames and the number of # tweets they've made. We wish to sort this list from users with most tweets to # users with fewest tweets. query = (User
.select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User.username))
You can order using the same COUNT expression used in the select clause. In the example below we are ordering by the COUNT() of tweet ids descending:
query = (User
.select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User.username)
.order_by(fn.COUNT(Tweet.id).desc()))
Alternatively, you can reference the alias assigned to the calculated value in the select clause. This method has the benefit of being a bit easier to read. Note that we are not referring to the named alias directly, but are wrapping it using the SQL helper:
query = (User
.select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User.username)
.order_by(SQL('num_tweets').desc()))
Or, to do things the "peewee" way:
ntweets = fn.COUNT(Tweet.id) query = (User
.select(User.username, ntweets.alias('num_tweets'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User.username)
.order_by(ntweets.desc())
Occasionally you may want to pull a random record from the database. You can accomplish this by ordering by the random or rand function (depending on your database):
Postgresql and Sqlite use the Random function:
# Pick 5 lucky winners: LotteryNumber.select().order_by(fn.Random()).limit(5)
MySQL uses Rand:
# Pick 5 lucky winners: LotteryNumber.select().order_by(fn.Rand()).limit(5)
The paginate() method makes it easy to grab a page or records. paginate() takes two parameters, page_number, and items_per_page.
ATTENTION:
>>> for tweet in Tweet.select().order_by(Tweet.id).paginate(2, 10): ... print(tweet.message) ... tweet 10 tweet 11 tweet 12 tweet 13 tweet 14 tweet 15 tweet 16 tweet 17 tweet 18 tweet 19
If you would like more granular control, you can always use limit() and offset().
You can count the number of rows in any select query:
>>> Tweet.select().count() 100 >>> Tweet.select().where(Tweet.id > 50).count() 50
Peewee will wrap your query in an outer query that performs a count, which results in SQL like:
SELECT COUNT(1) FROM ( ... your query ... );
Suppose you have some users and want to get a list of them along with the count of tweets in each.
query = (User
.select(User, fn.Count(Tweet.id).alias('count'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User))
The resulting query will return User objects with all their normal attributes plus an additional attribute count which will contain the count of tweets for each user. We use a left outer join to include users who have no tweets.
Let's assume you have a tagging application and want to find tags that have a certain number of related objects. For this example we'll use some different models in a many-to-many configuration:
class Photo(Model):
image = CharField() class Tag(Model):
name = CharField() class PhotoTag(Model):
photo = ForeignKeyField(Photo)
tag = ForeignKeyField(Tag)
Now say we want to find tags that have at least 5 photos associated with them:
query = (Tag
.select()
.join(PhotoTag)
.join(Photo)
.group_by(Tag)
.having(fn.Count(Photo.id) > 5))
This query is equivalent to the following SQL:
SELECT t1."id", t1."name" FROM "tag" AS t1 INNER JOIN "phototag" AS t2 ON t1."id" = t2."tag_id" INNER JOIN "photo" AS t3 ON t2."photo_id" = t3."id" GROUP BY t1."id", t1."name" HAVING Count(t3."id") > 5
Suppose we want to grab the associated count and store it on the tag:
query = (Tag
.select(Tag, fn.Count(Photo.id).alias('count'))
.join(PhotoTag)
.join(Photo)
.group_by(Tag)
.having(fn.Count(Photo.id) > 5))
You can retrieve scalar values by calling Query.scalar(). For instance:
>>> PageView.select(fn.Count(fn.Distinct(PageView.url))).scalar() 100
You can retrieve multiple scalar values by passing as_tuple=True:
>>> Employee.select( ... fn.Min(Employee.salary), fn.Max(Employee.salary) ... ).scalar(as_tuple=True) (30000, 50000)
A Window function refers to an aggregate function that operates on a sliding window of data that is being processed as part of a SELECT query. Window functions make it possible to do things like:
peewee comes with support for SQL window functions, which can be created by calling Function.over() and passing in your partitioning or ordering parameters.
For the following examples, we'll use the following model and sample data:
class Sample(Model):
counter = IntegerField()
value = FloatField() data = [(1, 10),
(1, 20),
(2, 1),
(2, 3),
(3, 100)] Sample.insert_many(data, fields=[Sample.counter, Sample.value]).execute()
Our sample table now contains:
id | counter | value |
1 | 1 | 10.0 |
2 | 1 | 20.0 |
3 | 2 | 1.0 |
4 | 2 | 3.0 |
5 | 3 | 100.0 |
Let's calculate a running sum of the value field. In order for it to be a "running" sum, we need it to be ordered, so we'll order with respect to the Sample's id field:
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(order_by=[Sample.id]).alias('total')) for sample in query:
print(sample.counter, sample.value, sample.total) # 1 10. 10. # 1 20. 30. # 2 1. 31. # 2 3. 34. # 3 100 134.
For another example, we'll calculate the difference between the current value and the previous value, when ordered by the id:
difference = Sample.value - fn.LAG(Sample.value, 1).over(order_by=[Sample.id]) query = Sample.select(
Sample.counter,
Sample.value,
difference.alias('diff')) for sample in query:
print(sample.counter, sample.value, sample.diff) # 1 10. NULL # 1 20. 10. -- (20 - 10) # 2 1. -19. -- (1 - 20) # 2 3. 2. -- (3 - 1) # 3 100 97. -- (100 - 3)
Let's calculate the average value for each distinct "counter" value. Notice that there are three possible values for the counter field (1, 2, and 3). We can do this by calculating the AVG() of the value column over a window that is partitioned depending on the counter field:
query = Sample.select(
Sample.counter,
Sample.value,
fn.AVG(Sample.value).over(partition_by=[Sample.counter]).alias('cavg')) for sample in query:
print(sample.counter, sample.value, sample.cavg) # 1 10. 15. # 1 20. 15. # 2 1. 2. # 2 3. 2. # 3 100 100.
We can use ordering within partitions by specifying both the order_by and partition_by parameters. For an example, let's rank the samples by value within each distinct counter group.
query = Sample.select(
Sample.counter,
Sample.value,
fn.RANK().over(
order_by=[Sample.value],
partition_by=[Sample.counter]).alias('rank')) for sample in query:
print(sample.counter, sample.value, sample.rank) # 1 10. 1 # 1 20. 2 # 2 1. 1 # 2 3. 2 # 3 100 1
By default, window functions are evaluated using an unbounded preceding start for the window, and the current row as the end. We can change the bounds of the window our aggregate functions operate on by specifying a start and/or end in the call to Function.over(). Additionally, Peewee comes with helper-methods on the Window object for generating the appropriate boundary references:
To examine how boundaries work, we'll calculate a running total of the value column, ordered with respect to id, but we'll only look the running total of the current row and it's two preceding rows:
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(
order_by=[Sample.id],
start=Window.preceding(2),
end=Window.CURRENT_ROW).alias('rsum')) for sample in query:
print(sample.counter, sample.value, sample.rsum) # 1 10. 10. # 1 20. 30. -- (20 + 10) # 2 1. 31. -- (1 + 20 + 10) # 2 3. 24. -- (3 + 1 + 20) # 3 100 104. -- (100 + 3 + 1)
NOTE:
Let's look at another example. In this example we will calculate the "opposite" of a running total, in which the total sum of all values is decreased by the value of the samples, ordered by id. To accomplish this, we'll calculate the sum from the current row to the last row.
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(
order_by=[Sample.id],
start=Window.CURRENT_ROW,
end=Window.following()).alias('rsum')) # 1 10. 134. -- (10 + 20 + 1 + 3 + 100) # 1 20. 124. -- (20 + 1 + 3 + 100) # 2 1. 104. -- (1 + 3 + 100) # 2 3. 103. -- (3 + 100) # 3 100 100. -- (100)
Aggregate functions may also support filter functions (Postgres and Sqlite 3.25+), which get translated into a FILTER (WHERE...) clause. Filter expressions are added to an aggregate function with the Function.filter() method.
For an example, we will calculate the running sum of the value field with respect to the id, but we will filter-out any samples whose counter=2.
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).filter(Sample.counter != 2).over(
order_by=[Sample.id]).alias('csum')) for sample in query:
print(sample.counter, sample.value, sample.csum) # 1 10. 10. # 1 20. 30. # 2 1. 30. # 2 3. 30. # 3 100 130.
NOTE:
If you intend to use the same window definition for multiple aggregates, you can create a Window object. The Window object takes the same parameters as Function.over(), and can be passed to the over() method in-place of the individual parameters.
Here we'll declare a single window, ordered with respect to the sample id, and call several window functions using that window definition:
win = Window(order_by=[Sample.id]) query = Sample.select(
Sample.counter,
Sample.value,
fn.LEAD(Sample.value).over(win),
fn.LAG(Sample.value).over(win),
fn.SUM(Sample.value).over(win) ).window(win) # Include our window definition in query. for row in query.tuples():
print(row) # counter value lead() lag() sum() # 1 10. 20. NULL 10. # 1 20. 1. 10. 30. # 2 1. 3. 20. 31. # 2 3. 100. 1. 34. # 3 100. NULL 3. 134.
In the previous example, we saw how to declare a Window definition and re-use it for multiple different aggregations. You can include as many window definitions as you need in your queries, but it is necessary to ensure each window has a unique alias:
w1 = Window(order_by=[Sample.id]).alias('w1') w2 = Window(partition_by=[Sample.counter]).alias('w2') query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(w1).alias('rsum'), # Running total.
fn.AVG(Sample.value).over(w2).alias('cavg') # Avg per category. ).window(w1, w2) # Include our window definitions. for sample in query:
print(sample.counter, sample.value, sample.rsum, sample.cavg) # counter value rsum cavg # 1 10. 10. 15. # 1 20. 30. 15. # 2 1. 31. 2. # 2 3. 34. 2. # 3 100 134. 100.
Similarly, if you have multiple window definitions that share similar definitions, it is possible to extend a previously-defined window definition. For example, here we will be partitioning the data-set by the counter value, so we'll be doing our aggregations with respect to the counter. Then we'll define a second window that extends this partitioning, and adds an ordering clause:
w1 = Window(partition_by=[Sample.counter]).alias('w1') # By extending w1, this window definition will also be partitioned # by "counter". w2 = Window(extends=w1, order_by=[Sample.value.desc()]).alias('w2') query = (Sample
.select(Sample.counter, Sample.value,
fn.SUM(Sample.value).over(w1).alias('group_sum'),
fn.RANK().over(w2).alias('revrank'))
.window(w1, w2)
.order_by(Sample.id)) for sample in query:
print(sample.counter, sample.value, sample.group_sum, sample.revrank) # counter value group_sum revrank # 1 10. 30. 2 # 1 20. 30. 1 # 2 1. 4. 2 # 2 3. 4. 1 # 3 100. 100. 1
Depending on the frame type, the database will process ordered groups differently. Let's create two additional Sample rows to visualize the difference:
>>> Sample.create(counter=1, value=20.) <Sample 6> >>> Sample.create(counter=2, value=1.) <Sample 7>
Our table now contains:
id | counter | value |
1 | 1 | 10.0 |
2 | 1 | 20.0 |
3 | 2 | 1.0 |
4 | 2 | 3.0 |
5 | 3 | 100.0 |
6 | 1 | 20.0 |
7 | 2 | 1.0 |
Let's examine the difference by calculating a "running sum" of the samples, ordered with respect to the counter and value fields. To specify the frame type, we can use either:
The behavior of RANGE, when there are logical duplicates, may lead to unexpected results:
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(
order_by=[Sample.counter, Sample.value],
frame_type=Window.RANGE).alias('rsum')) for sample in query.order_by(Sample.counter, Sample.value):
print(sample.counter, sample.value, sample.rsum) # counter value rsum # 1 10. 10. # 1 20. 50. # 1 20. 50. # 2 1. 52. # 2 1. 52. # 2 3. 55. # 3 100 155.
With the inclusion of the new rows we now have some rows that have duplicate category and value values. The RANGE frame type causes these duplicates to be evaluated together rather than separately.
The more expected result can be achieved by using ROWS as the frame-type:
query = Sample.select(
Sample.counter,
Sample.value,
fn.SUM(Sample.value).over(
order_by=[Sample.counter, Sample.value],
frame_type=Window.ROWS).alias('rsum')) for sample in query.order_by(Sample.counter, Sample.value):
print(sample.counter, sample.value, sample.rsum) # counter value rsum # 1 10. 10. # 1 20. 30. # 1 20. 50. # 2 1. 51. # 2 1. 52. # 2 3. 55. # 3 100 155.
Peewee uses these rules for determining what frame-type to use:
The Window.GROUPS frame type looks at the window range specification in terms of groups of rows, based on the ordering term(s). Using GROUPS, we can define the frame so it covers distinct groupings of rows. Let's look at an example:
query = (Sample
.select(Sample.counter, Sample.value,
fn.SUM(Sample.value).over(
order_by=[Sample.counter, Sample.value],
frame_type=Window.GROUPS,
start=Window.preceding(1)).alias('gsum'))
.order_by(Sample.counter, Sample.value)) for sample in query:
print(sample.counter, sample.value, sample.gsum) # counter value gsum # 1 10 10 # 1 20 50 # 1 20 50 (10) + (20+0) # 2 1 42 # 2 1 42 (20+20) + (1+1) # 2 3 5 (1+1) + 3 # 3 100 103 (3) + 100
As you can hopefully infer, the window is grouped by its ordering term, which is (counter, value). We are looking at a window that extends between one previous group and the current group.
NOTE:
For general information on window functions, read the postgres window functions tutorial
Additionally, the postgres docs and the sqlite docs contain a lot of good information.
Sometimes you do not need the overhead of creating model instances and simply want to iterate over the row data without needing all the APIs provided Model. To do this, use:
stats = (Stat
.select(Stat.url, fn.Count(Stat.url))
.group_by(Stat.url)
.tuples()) # iterate over a list of 2-tuples containing the url and count for stat_url, stat_count in stats:
print(stat_url, stat_count)
Similarly, you can return the rows from the cursor as dictionaries using dicts():
stats = (Stat
.select(Stat.url, fn.Count(Stat.url).alias('ct'))
.group_by(Stat.url)
.dicts()) # iterate over a list of 2-tuples containing the url and count for stat in stats:
print(stat['url'], stat['ct'])
PostgresqlDatabase supports a RETURNING clause on UPDATE, INSERT and DELETE queries. Specifying a RETURNING clause allows you to iterate over the rows accessed by the query.
By default, the return values upon execution of the different queries are:
When a returning clause is used the return value upon executing a query will be an iterable cursor object.
Postgresql allows, via the RETURNING clause, to return data from the rows inserted or modified by a query.
For example, let's say you have an Update that deactivates all user accounts whose registration has expired. After deactivating them, you want to send each user an email letting them know their account was deactivated. Rather than writing two queries, a SELECT and an UPDATE, you can do this in a single UPDATE query with a RETURNING clause:
query = (User
.update(is_active=False)
.where(User.registration_expired == True)
.returning(User)) # Send an email to every user that was deactivated. for deactivate_user in query.execute():
send_deactivation_email(deactivated_user.email)
The RETURNING clause is also available on Insert and Delete. When used with INSERT, the newly-created rows will be returned. When used with DELETE, the deleted rows will be returned.
The only limitation of the RETURNING clause is that it can only consist of columns from tables listed in the query's FROM clause. To select all columns from a particular table, you can simply pass in the Model class.
As another example, let's add a user and set their creation-date to the server-generated current timestamp. We'll create and retrieve the new user's ID, Email and the creation timestamp in a single query:
query = (User
.insert(email='foo@bar.com', created=fn.now())
.returning(User)) # Shorthand for all columns on User. # When using RETURNING, execute() returns a cursor. cursor = query.execute() # Get the user object we just inserted and log the data: user = cursor[0] logger.info('Created user %s (id=%s) at %s', user.email, user.id, user.created)
By default the cursor will return Model instances, but you can specify a different row type:
data = [{'name': 'charlie'}, {'name': 'huey'}, {'name': 'mickey'}] query = (User
.insert_many(data)
.returning(User.id, User.username)
.dicts()) for new_user in query.execute():
print('Added user "%s", id=%s' % (new_user['username'], new_user['id']))
Just as with Select queries, you can specify various result row types.
Peewee supports the inclusion of common table expressions (CTEs) in all types of queries. CTEs may be useful for:
To declare a Select query for use as a CTE, use cte() method, which wraps the query in a CTE object. To indicate that a CTE should be included as part of a query, use the Query.with_cte() method, passing a list of CTE objects.
For an example, let's say we have some data points that consist of a key and a floating-point value. Let's define our model and populate some test data:
class Sample(Model):
key = TextField()
value = FloatField() data = (
('a', (1.25, 1.5, 1.75)),
('b', (2.1, 2.3, 2.5, 2.7, 2.9)),
('c', (3.5, 3.5))) # Populate data. for key, values in data:
Sample.insert_many([(key, value) for value in values],
fields=[Sample.key, Sample.value]).execute()
Let's use a CTE to calculate, for each distinct key, which values were above-average for that key.
# First we'll declare the query that will be used as a CTE. This query # simply determines the average value for each key. cte = (Sample
.select(Sample.key, fn.AVG(Sample.value).alias('avg_value'))
.group_by(Sample.key)
.cte('key_avgs', columns=('key', 'avg_value'))) # Now we'll query the sample table, using our CTE to find rows whose value # exceeds the average for the given key. We'll calculate how far above the # average the given sample's value is, as well. query = (Sample
.select(Sample.key, Sample.value)
.join(cte, on=(Sample.key == cte.c.key))
.where(Sample.value > cte.c.avg_value)
.order_by(Sample.value)
.with_cte(cte))
We can iterate over the samples returned by the query to see which samples had above-average values for their given group:
>>> for sample in query: ... print(sample.key, sample.value) # 'a', 1.75 # 'b', 2.7 # 'b', 2.9
For a more complete example, let's consider the following query which uses multiple CTEs to find per-product sales totals in only the top sales regions. Our model looks like this:
class Order(Model):
region = TextField()
amount = FloatField()
product = TextField()
quantity = IntegerField()
Here is how the query might be written in SQL. This example can be found in the postgresql documentation.
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales) / 10 FROM regional_sales)
) SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
With Peewee, we would write:
reg_sales = (Order
.select(Order.region,
fn.SUM(Order.amount).alias('total_sales'))
.group_by(Order.region)
.cte('regional_sales')) top_regions = (reg_sales
.select(reg_sales.c.region)
.where(reg_sales.c.total_sales > (
reg_sales.select(fn.SUM(reg_sales.c.total_sales) / 10)))
.cte('top_regions')) query = (Order
.select(Order.region,
Order.product,
fn.SUM(Order.quantity).alias('product_units'),
fn.SUM(Order.amount).alias('product_sales'))
.where(Order.region.in_(top_regions.select(top_regions.c.region)))
.group_by(Order.region, Order.product)
.with_cte(regional_sales, top_regions))
Peewee supports recursive CTEs. Recursive CTEs can be useful when, for example, you have a tree data-structure represented by a parent-link foreign key. Suppose, for example, that we have a hierarchy of categories for an online bookstore. We wish to generate a table showing all categories and their absolute depths, along with the path from the root to the category.
We'll assume the following model definition, in which each category has a foreign-key to its immediate parent category:
class Category(Model):
name = TextField()
parent = ForeignKeyField('self', backref='children', null=True)
To list all categories along with their depth and parents, we can use a recursive CTE:
# Define the base case of our recursive CTE. This will be categories that # have a null parent foreign-key. Base = Category.alias() level = Value(1).alias('level') path = Base.name.alias('path') base_case = (Base
.select(Base.id, Base.name, Base.parent, level, path)
.where(Base.parent.is_null())
.cte('base', recursive=True)) # Define the recursive terms. RTerm = Category.alias() rlevel = (base_case.c.level + 1).alias('level') rpath = base_case.c.path.concat('->').concat(RTerm.name).alias('path') recursive = (RTerm
.select(RTerm.id, RTerm.name, RTerm.parent, rlevel, rpath)
.join(base_case, on=(RTerm.parent == base_case.c.id))) # The recursive CTE is created by taking the base case and UNION ALL with # the recursive term. cte = base_case.union_all(recursive) # We will now query from the CTE to get the categories, their levels, and # their paths. query = (cte
.select_from(cte.c.name, cte.c.level, cte.c.path)
.order_by(cte.c.path)) # We can now iterate over a list of all categories and print their names, # absolute levels, and path from root -> category. for category in query:
print(category.name, category.level, category.path) # Example output: # root, 1, root # p1, 2, root->p1 # c1-1, 3, root->p1->c1-1 # c1-2, 3, root->p1->c1-2 # p2, 2, root->p2 # c2-1, 3, root->p2->c2-1
This section have been moved into its own document: relationships.
The following types of comparisons are supported by peewee:
Comparison | Meaning |
== | x equals y |
< | x is less than y |
<= | x is less than or equal to y |
> | x is greater than y |
>= | x is greater than or equal to y |
!= | x is not equal to y |
<< | x IN y, where y is a list or query |
>> | x IS y, where y is None/NULL |
% | x LIKE y where y may contain wildcards |
** | x ILIKE y where y may contain wildcards |
^ | x XOR y |
~ | Unary negation (e.g., NOT x) |
Because I ran out of operators to override, there are some additional query operations available as methods:
Method | Meaning |
.in_(value) | IN lookup (identical to <<). |
.not_in(value) | NOT IN lookup. |
.is_null(is_null) | IS NULL or IS NOT NULL. Accepts boolean param. |
.contains(substr) | Wild-card search for substring. |
.startswith(prefix) | Search for values beginning with prefix. |
.endswith(suffix) | Search for values ending with suffix. |
.between(low, high) | Search for values between low and high. |
.regexp(exp) | Regular expression match (case-sensitive). |
.iregexp(exp) | Regular expression match (case-insensitive). |
.bin_and(value) | Binary AND. |
.bin_or(value) | Binary OR. |
.concat(other) | Concatenate two strings or objects using ||. |
.distinct() | Mark column for DISTINCT selection. |
.collate(collation) | Specify column with the given collation. |
.cast(type) | Cast the value of the column to the given type. |
To combine clauses using logical operators, use:
Operator | Meaning | Example |
& | AND | (User.is_active == True) & (User.is_admin == True) |
| (pipe) | OR | (User.is_admin) | (User.is_superuser) |
~ | NOT (unary negation) | ~(User.username.contains('admin')) |
Here is how you might use some of these query operators:
# Find the user whose username is "charlie". User.select().where(User.username == 'charlie') # Find the users whose username is in [charlie, huey, mickey] User.select().where(User.username.in_(['charlie', 'huey', 'mickey'])) Employee.select().where(Employee.salary.between(50000, 60000)) Employee.select().where(Employee.name.startswith('C')) Blog.select().where(Blog.title.contains(search_string))
Here is how you might combine expressions. Comparisons can be arbitrarily complex.
NOTE:
# Find any users who are active administrations. User.select().where(
(User.is_admin == True) &
(User.is_active == True)) # Find any users who are either administrators or super-users. User.select().where(
(User.is_admin == True) |
(User.is_superuser == True)) # Find any Tweets by users who are not admins (NOT IN). admins = User.select().where(User.is_admin == True) non_admin_tweets = Tweet.select().where(Tweet.user.not_in(admins)) # Find any users who are not my friends (strangers). friends = User.select().where(User.username.in_(['charlie', 'huey', 'mickey'])) strangers = User.select().where(User.id.not_in(friends))
WARNING:
So just remember:
For more examples, see the Expressions section.
NOTE:
Because SQLite's LIKE operation is case-insensitive by default, peewee will use the SQLite GLOB operation for case-sensitive searches. The glob operation uses asterisks for wildcards as opposed to the usual percent-sign. If you are using SQLite and want case-sensitive partial string matching, remember to use asterisks for the wildcard.
Because of the way SQL handles NULL, there are some special operations available for expressing:
While it would be possible to use the IS NULL and IN operators with the negation operator (~), sometimes to get the correct semantics you will need to explicitly use IS NOT NULL and NOT IN.
The simplest way to use IS NULL and IN is to use the operator overloads:
# Get all User objects whose last login is NULL. User.select().where(User.last_login >> None) # Get users whose username is in the given list. usernames = ['charlie', 'huey', 'mickey'] User.select().where(User.username << usernames)
If you don't like operator overloads, you can call the Field methods instead:
# Get all User objects whose last login is NULL. User.select().where(User.last_login.is_null(True)) # Get users whose username is in the given list. usernames = ['charlie', 'huey', 'mickey'] User.select().where(User.username.in_(usernames))
To negate the above queries, you can use unary negation, but for the correct semantics you may need to use the special IS NOT and NOT IN operators:
# Get all User objects whose last login is *NOT* NULL. User.select().where(User.last_login.is_null(False)) # Using unary negation instead. User.select().where(~(User.last_login >> None)) # Get users whose username is *NOT* in the given list. usernames = ['charlie', 'huey', 'mickey'] User.select().where(User.username.not_in(usernames)) # Using unary negation instead. usernames = ['charlie', 'huey', 'mickey'] User.select().where(~(User.username << usernames))
Because I ran out of python operators to overload, there are some missing operators in peewee, for instance modulo. If you find that you need to support an operator that is not in the table above, it is very easy to add your own.
Here is how you might add support for modulo in SQLite:
from peewee import * from peewee import Expression # the building block for expressions def mod(lhs, rhs):
return Expression(lhs, '%', rhs)
Now you can use these custom operators to build richer queries:
# Users with even ids. User.select().where(mod(User.id, 2) == 0)
For more examples check out the source to the playhouse.postgresql_ext module, as it contains numerous operators specific to postgresql's hstore.
Peewee is designed to provide a simple, expressive, and pythonic way of constructing SQL queries. This section will provide a quick overview of some common types of expressions.
There are two primary types of objects that can be composed to create expressions:
We will assume a simple "User" model with fields for username and other things. It looks like this:
class User(Model):
username = CharField()
is_admin = BooleanField()
is_active = BooleanField()
last_login = DateTimeField()
login_count = IntegerField()
failed_logins = IntegerField()
Comparisons use the Query operators:
# username is equal to 'charlie' User.username == 'charlie' # user has logged in less than 5 times User.login_count < 5
Comparisons can be combined using bitwise and and or. Operator precedence is controlled by python and comparisons can be nested to an arbitrary depth:
# User is both and admin and has logged in today (User.is_admin == True) & (User.last_login >= today) # User's username is either charlie or charles (User.username == 'charlie') | (User.username == 'charles')
Comparisons can be used with functions as well:
# user's username starts with a 'g' or a 'G': fn.Lower(fn.Substr(User.username, 1, 1)) == 'g'
We can do some fairly interesting things, as expressions can be compared against other expressions. Expressions also support arithmetic operations:
# users who entered the incorrect more than half the time and have logged # in at least 10 times (User.failed_logins > (User.login_count * .5)) & (User.login_count > 10)
Expressions allow us to do atomic updates:
# when a user logs in we want to increment their login count: User.update(login_count=User.login_count + 1).where(User.id == user_id)
Expressions can be used in all parts of a query, so experiment!
Many databases support row values, which are similar to Python tuple objects. In Peewee, it is possible to use row-values in expressions via Tuple. For example,
# If for some reason your schema stores dates in separate columns ("year", # "month" and "day"), you can use row-values to find all rows that happened # in a given month: Tuple(Event.year, Event.month) == (2019, 1)
The more common use for row-values is to compare against multiple columns from a subquery in a single expression. There are other ways to express these types of queries, but row-values may offer a concise and readable approach.
For example, assume we have a table "EventLog" which contains an event type, an event source, and some metadata. We also have an "IncidentLog", which has incident type, incident source, and metadata columns. We can use row-values to correlate incidents with certain events:
class EventLog(Model):
event_type = TextField()
source = TextField()
data = TextField()
timestamp = TimestampField() class IncidentLog(Model):
incident_type = TextField()
source = TextField()
traceback = TextField()
timestamp = TimestampField() # Get a list of all the incident types and sources that have occurred today. incidents = (IncidentLog
.select(IncidentLog.incident_type, IncidentLog.source)
.where(IncidentLog.timestamp >= datetime.date.today())) # Find all events that correlate with the type and source of the # incidents that occurred today. events = (EventLog
.select()
.where(Tuple(EventLog.event_type, EventLog.source).in_(incidents))
.order_by(EventLog.timestamp))
Other ways to express this type of query would be to use a join or to join on a subquery. The above example is there just to give you and idea how Tuple might be used.
You can also use row-values to update multiple columns in a table, when the new data is derived from a subquery. For an example, see here.
SQL functions, like COUNT() or SUM(), can be expressed using the fn() helper:
# Get all users and the number of tweets they've authored. Sort the # results from most tweets -> fewest tweets. query = (User
.select(User, fn.COUNT(Tweet.id).alias('tweet_count'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User)
.order_by(fn.COUNT(Tweet.id).desc())) for user in query:
print('%s -- %s tweets' % (user.username, user.tweet_count))
The fn helper exposes any SQL function as if it were a method. The parameters can be fields, values, subqueries, or even nested functions.
Suppose you need to want to get a list of all users whose username begins with a. There are a couple ways to do this, but one method might be to use some SQL functions like LOWER and SUBSTR. To use arbitrary SQL functions, use the special fn() object to construct queries:
# Select the user's id, username and the first letter of their username, lower-cased first_letter = fn.LOWER(fn.SUBSTR(User.username, 1, 1)) query = User.select(User, first_letter.alias('first_letter')) # Alternatively we could select only users whose username begins with 'a' a_users = User.select().where(first_letter == 'a') >>> for user in a_users: ... print(user.username)
There are times when you may want to simply pass in some arbitrary sql. You can do this using the special SQL class. One use-case is when referencing an alias:
# We'll query the user table and annotate it with a count of tweets for # the given user query = (User
.select(User, fn.Count(Tweet.id).alias('ct'))
.join(Tweet)
.group_by(User)) # Now we will order by the count, which was aliased to "ct" query = query.order_by(SQL('ct')) # You could, of course, also write this as: query = query.order_by(fn.COUNT(Tweet.id))
There are two ways to execute hand-crafted SQL statements with peewee:
By default peewee will parameterize queries, so any parameters passed in by the user will be escaped. The only exception to this rule is if you are writing a raw SQL query or are passing in a SQL object which may contain untrusted data. To mitigate this, ensure that any user-defined data is passed in as a query parameter and not part of the actual SQL query:
# Bad! DO NOT DO THIS! query = MyModel.raw('SELECT * FROM my_table WHERE data = %s' % (user_data,)) # Good. `user_data` will be treated as a parameter to the query. query = MyModel.raw('SELECT * FROM my_table WHERE data = %s', user_data) # Bad! DO NOT DO THIS! query = MyModel.select().where(SQL('Some SQL expression %s' % user_data)) # Good. `user_data` will be treated as a parameter. query = MyModel.select().where(SQL('Some SQL expression %s', user_data))
NOTE:
In this document we'll cover how Peewee handles relationships between models.
We'll use the following model definitions for our examples:
import datetime from peewee import * db = SqliteDatabase(':memory:') class BaseModel(Model):
class Meta:
database = db class User(BaseModel):
username = TextField() class Tweet(BaseModel):
content = TextField()
timestamp = DateTimeField(default=datetime.datetime.now)
user = ForeignKeyField(User, backref='tweets') class Favorite(BaseModel):
user = ForeignKeyField(User, backref='favorites')
tweet = ForeignKeyField(Tweet, backref='favorites')
Peewee uses ForeignKeyField to define foreign-key relationships between models. Every foreign-key field has an implied back-reference, which is exposed as a pre-filtered Select query using the provided backref attribute.
To follow along with the examples, let's populate this database with some test data:
def populate_test_data():
db.create_tables([User, Tweet, Favorite])
data = (
('huey', ('meow', 'hiss', 'purr')),
('mickey', ('woof', 'whine')),
('zaizee', ()))
for username, tweets in data:
user = User.create(username=username)
for tweet in tweets:
Tweet.create(user=user, content=tweet)
# Populate a few favorites for our users, such that:
favorite_data = (
('huey', ['whine']),
('mickey', ['purr']),
('zaizee', ['meow', 'purr']))
for username, favorites in favorite_data:
user = User.get(User.username == username)
for content in favorites:
tweet = Tweet.get(Tweet.content == content)
Favorite.create(user=user, tweet=tweet)
This gives us the following:
User | Tweet | Favorited by |
huey | meow | zaizee |
huey | hiss | |
huey | purr | mickey, zaizee |
mickey | woof | |
mickey | whine | huey |
ATTENTION:
import logging logger = logging.getLogger('peewee') logger.addHandler(logging.StreamHandler()) logger.setLevel(logging.DEBUG)
NOTE:
# Ensure foreign-key constraints are enforced. db = SqliteDatabase('my_app.db', pragmas={'foreign_keys': 1})
As an exercise in learning how to perform joins with Peewee, let's write a query to print out all the tweets by "huey". To do this we'll select from the Tweet model and join on the User model, so we can then filter on the User.username field:
>>> query = Tweet.select().join(User).where(User.username == 'huey') >>> for tweet in query: ... print(tweet.content) ... meow hiss purr
NOTE:
The following code is equivalent, but more explicit:
query = (Tweet
.select()
.join(User, on=(Tweet.user == User.id))
.where(User.username == 'huey'))
If we already had a reference to the User object for "huey", we could use the User.tweets back-reference to list all of huey's tweets:
>>> huey = User.get(User.username == 'huey') >>> for tweet in huey.tweets: ... print(tweet.content) ... meow hiss purr
Taking a closer look at huey.tweets, we can see that it is just a simple pre-filtered SELECT query:
>>> huey.tweets <peewee.ModelSelect at 0x7f0483931fd0> >>> huey.tweets.sql() ('SELECT "t1"."id", "t1"."content", "t1"."timestamp", "t1"."user_id"
FROM "tweet" AS "t1" WHERE ("t1"."user_id" = ?)', [1])
Let's take another look at joins by querying the list of users and getting the count of how many tweet's they've authored that were favorited. This will require us to join twice: from user to tweet, and from tweet to favorite. We'll add the additional requirement that users should be included who have not created any tweets, as well as users whose tweets have not been favorited. The query, expressed in SQL, would be:
SELECT user.username, COUNT(favorite.id) FROM user LEFT OUTER JOIN tweet ON tweet.user_id = user.id LEFT OUTER JOIN favorite ON favorite.tweet_id = tweet.id GROUP BY user.username
NOTE:
Peewee has a concept of a join context, meaning that whenever we call the join() method, we are implicitly joining on the previously-joined model (or if this is the first call, the model we are selecting from). Since we are joining straight through, from user to tweet, then from tweet to favorite, we can simply write:
query = (User
.select(User.username, fn.COUNT(Favorite.id).alias('count'))
.join(Tweet, JOIN.LEFT_OUTER) # Joins user -> tweet.
.join(Favorite, JOIN.LEFT_OUTER) # Joins tweet -> favorite.
.group_by(User.username))
Iterating over the results:
>>> for user in query: ... print(user.username, user.count) ... huey 3 mickey 1 zaizee 0
For a more complicated example involving multiple joins and switching join contexts, let's find all the tweets by Huey and the number of times they've been favorited. To do this we'll need to perform two joins and we'll also use an aggregate function to calculate the favorite count.
Here is how we would write this query in SQL:
SELECT tweet.content, COUNT(favorite.id) FROM tweet INNER JOIN user ON tweet.user_id = user.id LEFT OUTER JOIN favorite ON favorite.tweet_id = tweet.id WHERE user.username = 'huey' GROUP BY tweet.content;
NOTE:
With Peewee, the resulting Python code looks very similar to what we would write in SQL:
query = (Tweet
.select(Tweet.content, fn.COUNT(Favorite.id).alias('count'))
.join(User) # Join from tweet -> user.
.switch(Tweet) # Move "join context" back to tweet.
.join(Favorite, JOIN.LEFT_OUTER) # Join from tweet -> favorite.
.where(User.username == 'huey')
.group_by(Tweet.content))
Note the call to switch() - that instructs Peewee to set the join context back to Tweet. If we had omitted the explicit call to switch, Peewee would have used User (the last model we joined) as the join context and constructed the join from User to Favorite using the Favorite.user foreign-key, which would have given us incorrect results.
If we wanted to omit the join-context switching we could instead use the join_from() method. The following query is equivalent to the previous one:
query = (Tweet
.select(Tweet.content, fn.COUNT(Favorite.id).alias('count'))
.join_from(Tweet, User) # Join tweet -> user.
.join_from(Tweet, Favorite, JOIN.LEFT_OUTER) # Join tweet -> favorite.
.where(User.username == 'huey')
.group_by(Tweet.content))
We can iterate over the results of the above query to print the tweet's content and the favorite count:
>>> for tweet in query: ... print('%s favorited %d times' % (tweet.content, tweet.count)) ... meow favorited 1 times hiss favorited 0 times purr favorited 2 times
If we wished to list all the tweets in the database, along with the username of their author, you might try writing this:
>>> for tweet in Tweet.select(): ... print(tweet.user.username, '->', tweet.content) ... huey -> meow huey -> hiss huey -> purr mickey -> woof mickey -> whine
There is a big problem with the above loop: it executes an additional query for every tweet to look up the tweet.user foreign-key. For our small table the performance penalty isn't obvious, but we would find the delays grew as the number of rows increased.
If you're familiar with SQL, you might remember that it's possible to SELECT from multiple tables, allowing us to get the tweet content and the username in a single query:
SELECT tweet.content, user.username FROM tweet INNER JOIN user ON tweet.user_id = user.id;
Peewee makes this quite easy. In fact, we only need to modify our query a little bit. We tell Peewee we wish to select Tweet.content as well as the User.username field, then we include a join from tweet to user. To make it a bit more obvious that it's doing the correct thing, we can ask Peewee to return the rows as dictionaries.
>>> for row in Tweet.select(Tweet.content, User.username).join(User).dicts(): ... print(row) ... {'content': 'meow', 'username': 'huey'} {'content': 'hiss', 'username': 'huey'} {'content': 'purr', 'username': 'huey'} {'content': 'woof', 'username': 'mickey'} {'content': 'whine', 'username': 'mickey'}
Now we'll leave off the call to ".dicts()" and return the rows as Tweet objects. Notice that Peewee assigns the username value to tweet.user.username -- NOT tweet.username! Because there is a foreign-key from tweet to user, and we have selected fields from both models, Peewee will reconstruct the model-graph for us:
>>> for tweet in Tweet.select(Tweet.content, User.username).join(User): ... print(tweet.user.username, '->', tweet.content) ... huey -> meow huey -> hiss huey -> purr mickey -> woof mickey -> whine
If we wish to, we can control where Peewee puts the joined User instance in the above query, by specifying an attr in the join() method:
>>> query = Tweet.select(Tweet.content, User.username).join(User, attr='author') >>> for tweet in query: ... print(tweet.author.username, '->', tweet.content) ... huey -> meow huey -> hiss huey -> purr mickey -> woof mickey -> whine
Conversely, if we simply wish all attributes we select to be attributes of the Tweet instance, we can add a call to objects() at the end of our query (similar to how we called dicts()):
>>> for tweet in query.objects(): ... print(tweet.username, '->', tweet.content) ... huey -> meow (etc)
As a more complex example, in this query, we will write a single query that selects all the favorites, along with the user who created the favorite, the tweet that was favorited, and that tweet's author.
In SQL we would write:
SELECT owner.username, tweet.content, author.username AS author FROM favorite INNER JOIN user AS owner ON (favorite.user_id = owner.id) INNER JOIN tweet ON (favorite.tweet_id = tweet.id) INNER JOIN user AS author ON (tweet.user_id = author.id);
Note that we are selecting from the user table twice - once in the context of the user who created the favorite, and again as the author of the tweet.
With Peewee, we use Model.alias() to alias a model class so it can be referenced twice in a single query:
Owner = User.alias() query = (Favorite
.select(Favorite, Tweet.content, User.username, Owner.username)
.join(Owner) # Join favorite -> user (owner of favorite).
.switch(Favorite)
.join(Tweet) # Join favorite -> tweet
.join(User)) # Join tweet -> user
We can iterate over the results and access the joined values in the following way. Note how Peewee has resolved the fields from the various models we selected and reconstructed the model graph:
>>> for fav in query: ... print(fav.user.username, 'liked', fav.tweet.content, 'by', fav.tweet.user.username) ... huey liked whine by mickey mickey liked purr by huey zaizee liked meow by huey zaizee liked purr by huey
Peewee allows you to join on any table-like object, including subqueries or common table expressions (CTEs). To demonstrate joining on a subquery, let's query for all users and their latest tweet.
Here is the SQL:
SELECT tweet.*, user.* FROM tweet INNER JOIN (
SELECT latest.user_id, MAX(latest.timestamp) AS max_ts
FROM tweet AS latest
GROUP BY latest.user_id) AS latest_query ON ((tweet.user_id = latest_query.user_id) AND (tweet.timestamp = latest_query.max_ts)) INNER JOIN user ON (tweet.user_id = user.id)
We'll do this by creating a subquery which selects each user and the timestamp of their latest tweet. Then we can query the tweets table in the outer query and join on the user and timestamp combination from the subquery.
# Define our subquery first. We'll use an alias of the Tweet model, since # we will be querying from the Tweet model directly in the outer query. Latest = Tweet.alias() latest_query = (Latest
.select(Latest.user, fn.MAX(Latest.timestamp).alias('max_ts'))
.group_by(Latest.user)
.alias('latest_query')) # Our join predicate will ensure that we match tweets based on their # timestamp *and* user_id. predicate = ((Tweet.user == latest_query.c.user_id) &
(Tweet.timestamp == latest_query.c.max_ts)) # We put it all together, querying from tweet and joining on the subquery # using the above predicate. query = (Tweet
.select(Tweet, User) # Select all columns from tweet and user.
.join(latest_query, on=predicate) # Join tweet -> subquery.
.join_from(Tweet, User)) # Join from tweet -> user.
Iterating over the query, we can see each user and their latest tweet.
>>> for tweet in query: ... print(tweet.user.username, '->', tweet.content) ... huey -> purr mickey -> whine
There are a couple things you may not have seen before in the code we used to create the query in this section:
In the previous section we joined on a subquery, but we could just as easily have used a common-table expression (CTE). We will repeat the same query as before, listing users and their latest tweets, but this time we will do it using a CTE.
Here is the SQL:
WITH latest AS (
SELECT user_id, MAX(timestamp) AS max_ts
FROM tweet
GROUP BY user_id) SELECT tweet.*, user.* FROM tweet INNER JOIN latest
ON ((latest.user_id = tweet.user_id) AND (latest.max_ts = tweet.timestamp)) INNER JOIN user
ON (tweet.user_id = user.id)
This example looks very similar to the previous example with the subquery:
# Define our CTE first. We'll use an alias of the Tweet model, since # we will be querying from the Tweet model directly in the main query. Latest = Tweet.alias() cte = (Latest
.select(Latest.user, fn.MAX(Latest.timestamp).alias('max_ts'))
.group_by(Latest.user)
.cte('latest')) # Our join predicate will ensure that we match tweets based on their # timestamp *and* user_id. predicate = ((Tweet.user == cte.c.user_id) &
(Tweet.timestamp == cte.c.max_ts)) # We put it all together, querying from tweet and joining on the CTE # using the above predicate. query = (Tweet
.select(Tweet, User) # Select all columns from tweet and user.
.join(cte, on=predicate) # Join tweet -> CTE.
.join_from(Tweet, User) # Join from tweet -> user.
.with_cte(cte))
We can iterate over the result-set, which consists of the latest tweets for each user:
>>> for tweet in query: ... print(tweet.user.username, '->', tweet.content) ... huey -> purr mickey -> whine
NOTE:
When there are multiple foreign keys to the same model, it is good practice to explicitly specify which field you are joining on.
Referring back to the example app's models, consider the Relationship model, which is used to denote when one user follows another. Here is the model definition:
class Relationship(BaseModel):
from_user = ForeignKeyField(User, backref='relationships')
to_user = ForeignKeyField(User, backref='related_to')
class Meta:
indexes = (
# Specify a unique multi-column index on from/to-user.
(('from_user', 'to_user'), True),
)
Since there are two foreign keys to User, we should always specify which field we are using in a join.
For example, to determine which users I am following, I would write:
(User
.select()
.join(Relationship, on=Relationship.to_user)
.where(Relationship.from_user == charlie))
On the other hand, if I wanted to determine which users are following me, I would instead join on the from_user column and filter on the relationship's to_user:
(User
.select()
.join(Relationship, on=Relationship.from_user)
.where(Relationship.to_user == charlie))
If a foreign key does not exist between two tables you can still perform a join, but you must manually specify the join predicate.
In the following example, there is no explicit foreign-key between User and ActivityLog, but there is an implied relationship between the ActivityLog.object_id field and User.id. Rather than joining on a specific Field, we will join using an Expression.
user_log = (User
.select(User, ActivityLog)
.join(ActivityLog, on=(User.id == ActivityLog.object_id), attr='log')
.where(
(ActivityLog.activity_type == 'user_activity') &
(User.username == 'charlie'))) for user in user_log:
print(user.username, user.log.description) #### Print something like #### charlie logged in charlie posted a tweet charlie retweeted charlie posted a tweet charlie logged out
NOTE:
join(ActivityLog, on=(User.id == ActivityLog.object_id), attr='log')
Then when iterating over the query, we were able to directly access the joined ActivityLog without incurring an additional query:
for user in user_log:
print(user.username, user.log.description)
Peewee supports constructing queries containing a self-join.
To join on the same model (table) twice, it is necessary to create a model alias to represent the second instance of the table in a query. Consider the following model:
class Category(Model):
name = CharField()
parent = ForeignKeyField('self', backref='children')
What if we wanted to query all categories whose parent category is Electronics. One way would be to perform a self-join:
Parent = Category.alias() query = (Category
.select()
.join(Parent, on=(Category.parent == Parent.id))
.where(Parent.name == 'Electronics'))
When performing a join that uses a ModelAlias, it is necessary to specify the join condition using the on keyword argument. In this case we are joining the category with its parent category.
Another less common approach involves the use of subqueries. Here is another way we might construct a query to get all the categories whose parent category is Electronics using a subquery:
Parent = Category.alias() join_query = Parent.select().where(Parent.name == 'Electronics') # Subqueries used as JOINs need to have an alias. join_query = join_query.alias('jq') query = (Category
.select()
.join(join_query, on=(Category.parent == join_query.c.id)))
This will generate the following SQL query:
SELECT t1."id", t1."name", t1."parent_id" FROM "category" AS t1 INNER JOIN (
SELECT t2."id"
FROM "category" AS t2
WHERE (t2."name" = ?)) AS jq ON (t1."parent_id" = "jq"."id")
To access the id value from the subquery, we use the .c magic lookup which will generate the appropriate SQL expression:
Category.parent == join_query.c.id # Becomes: (t1."parent_id" = "jq"."id")
Peewee provides a field for representing many-to-many relationships, much like Django does. This feature was added due to many requests from users, but I strongly advocate against using it, since it conflates the idea of a field with a junction table and hidden joins. It's just a nasty hack to provide convenient accessors.
To implement many-to-many correctly with peewee, you will therefore create the intermediary table yourself and query through it:
class Student(Model):
name = CharField() class Course(Model):
name = CharField() class StudentCourse(Model):
student = ForeignKeyField(Student)
course = ForeignKeyField(Course)
To query, let's say we want to find students who are enrolled in math class:
query = (Student
.select()
.join(StudentCourse)
.join(Course)
.where(Course.name == 'math')) for student in query:
print(student.name)
To query what classes a given student is enrolled in:
courses = (Course
.select()
.join(StudentCourse)
.join(Student)
.where(Student.name == 'da vinci')) for course in courses:
print(course.name)
To efficiently iterate over a many-to-many relation, i.e., list all students and their respective courses, we will query the through model StudentCourse and precompute the Student and Course:
query = (StudentCourse
.select(StudentCourse, Student, Course)
.join(Course)
.switch(StudentCourse)
.join(Student)
.order_by(Student.name))
To print a list of students and their courses you might do the following:
for student_course in query:
print(student_course.student.name, '->', student_course.course.name)
Since we selected all fields from Student and Course in the select clause of the query, these foreign key traversals are "free" and we've done the whole iteration with just 1 query.
The ManyToManyField provides a field-like API over many-to-many fields. For all but the simplest many-to-many situations, you're better off using the standard peewee APIs. But, if your models are very simple and your querying needs are not very complex, ManyToManyField may work.
Modeling students and courses using ManyToManyField:
from peewee import * db = SqliteDatabase('school.db') class BaseModel(Model):
class Meta:
database = db class Student(BaseModel):
name = CharField() class Course(BaseModel):
name = CharField()
students = ManyToManyField(Student, backref='courses') StudentCourse = Course.students.get_through_model() db.create_tables([
Student,
Course,
StudentCourse]) # Get all classes that "huey" is enrolled in: huey = Student.get(Student.name == 'Huey') for course in huey.courses.order_by(Course.name):
print(course.name) # Get all students in "English 101": engl_101 = Course.get(Course.name == 'English 101') for student in engl_101.students:
print(student.name) # When adding objects to a many-to-many relationship, we can pass # in either a single model instance, a list of models, or even a # query of models: huey.courses.add(Course.select().where(Course.name.contains('English'))) engl_101.students.add(Student.get(Student.name == 'Mickey')) engl_101.students.add([
Student.get(Student.name == 'Charlie'),
Student.get(Student.name == 'Zaizee')]) # The same rules apply for removing items from a many-to-many: huey.courses.remove(Course.select().where(Course.name.startswith('CS'))) engl_101.students.remove(huey) # Calling .clear() will remove all associated objects: cs_150.students.clear()
ATTENTION:
WARNING:
A ManyToManyField, despite its name, is not a field in the usual sense. Instead of being a column on a table, the many-to-many field covers the fact that behind-the-scenes there's actually a separate table with two foreign-key pointers (the through table).
Therefore, when a subclass is created that inherits a many-to-many field, what actually needs to be inherited is the through table. Because of the potential for subtle bugs, Peewee does not attempt to automatically subclass the through model and modify its foreign-key pointers. As a result, many-to-many fields typically will not work with inheritance.
For more examples, see:
The N+1 problem refers to a situation where an application performs a query, then for each row of the result set, the application performs at least one other query (another way to conceptualize this is as a nested loop). In many cases, these n queries can be avoided through the use of a SQL join or subquery. The database itself may do a nested loop, but it will usually be more performant than doing n queries in your application code, which involves latency communicating with the database and may not take advantage of indices or other optimizations employed by the database when joining or executing a subquery.
Peewee provides several APIs for mitigating N+1 query behavior. Recollecting the models used throughout this document, User and Tweet, this section will try to outline some common N+1 scenarios, and how peewee can help you avoid them.
ATTENTION:
The twitter timeline displays a list of tweets from multiple users. In addition to the tweet's content, the username of the tweet's author is also displayed. The N+1 scenario here would be:
By selecting both tables and using a join, peewee makes it possible to accomplish this in a single query:
query = (Tweet
.select(Tweet, User) # Note that we are selecting both models.
.join(User) # Use an INNER join because every tweet has an author.
.order_by(Tweet.id.desc()) # Get the most recent tweets.
.limit(10)) for tweet in query:
print(tweet.user.username, '-', tweet.message)
Without the join, accessing tweet.user.username would trigger a query to resolve the foreign key tweet.user and retrieve the associated user. But since we have selected and joined on User, peewee will automatically resolve the foreign-key for us.
NOTE:
Let's say you want to build a page that shows several users and all of their tweets. The N+1 scenario would be:
This situation is similar to the previous example, but there is one important difference: when we selected tweets, they only have a single associated user, so we could directly assign the foreign key. The reverse is not true, however, as one user may have any number of tweets (or none at all).
Peewee provides an approach to avoiding O(n) queries in this situation. Fetch users first, then fetch all the tweets associated with those users. Once peewee has the big list of tweets, it will assign them out, matching them with the appropriate user. This method is usually faster but will involve a query for each table being selected.
peewee supports pre-fetching related data using sub-queries. This method requires the use of a special API, prefetch(). Prefetch, as its name implies, will eagerly load the appropriate tweets for the given users using subqueries. This means instead of O(n) queries for n rows, we will do O(k) queries for k tables.
Here is an example of how we might fetch several users and any tweets they created within the past week.
week_ago = datetime.date.today() - datetime.timedelta(days=7) users = User.select() tweets = (Tweet
.select()
.where(Tweet.timestamp >= week_ago)) # This will perform two queries. users_with_tweets = prefetch(users, tweets) for user in users_with_tweets:
print(user.username)
for tweet in user.tweets:
print(' ', tweet.message)
NOTE:
prefetch() can be used to query an arbitrary number of tables. Check the API documentation for more examples.
Some things to consider when using prefetch():
This document specifies Peewee's APIs.
The Database is responsible for:
NOTE:
Examples:
# Sqlite database using WAL-mode and 32MB page-cache. db = SqliteDatabase('app.db', pragmas={
'journal_mode': 'wal',
'cache_size': -32 * 1000}) # Postgresql database on remote host. db = PostgresqlDatabase('my_app', user='postgres', host='10.1.0.3',
password='secret')
Deferred initialization example:
db = PostgresqlDatabase(None) class BaseModel(Model):
class Meta:
database = db # Read database connection info from env, for example: db_name = os.environ['DATABASE'] db_host = os.environ['PGHOST'] # Initialize database. db.init(db_name, host=db_host, user='postgres')
Initialize a deferred database. See deferring_initialization for more info.
Additionally, any SQL executed within the wrapped block will be executed in a transaction.
Example:
def on_app_startup():
# When app starts up, create the database tables, being sure
# the connection is closed upon completion.
with database.connection_context():
database.create_tables(APP_MODELS)
Open a connection to the database.
Close the connection to the database.
Return a cursor object on the current connection. If a connection is not open, one will be opened. The cursor will be whatever the underlying database-driver uses to encapsulate a database cursor.
Execute a SQL query and return a cursor over the results.
Execute a SQL query by compiling a Query instance and executing the resulting SQL.
Calls to atomic() can be nested.
atomic() can also be used as a decorator.
Example code:
with db.atomic() as txn:
perform_operation()
with db.atomic() as nested_txn:
perform_another_operation()
Transactions and save-points can be explicitly committed or rolled-back within the wrapped block. If this occurs, a new transaction or savepoint is begun after the commit/rollback.
Example:
with db.atomic() as txn:
User.create(username='mickey')
txn.commit() # Changes are saved and a new transaction begins.
User.create(username='huey')
txn.rollback() # "huey" will not be saved.
User.create(username='zaizee') # Print the usernames of all users. print [u.username for u in User.select()] # Prints ["mickey", "zaizee"]
Example:
with db.manual_commit():
db.begin() # Begin transaction explicitly.
try:
user.delete_instance(recursive=True)
except:
db.rollback() # Rollback -- an error occurred.
raise
else:
try:
db.commit() # Attempt to commit changes.
except:
db.rollback() # Error committing, rollback.
raise
The above code is equivalent to the following:
with db.atomic():
user.delete_instance(recursive=True)
NOTE:
The session_start() method should only be used if the sequence of operations does not easily lend itself to wrapping using either a context-manager or decorator.
WARNING:
WARNING:
WARNING:
NOTE:
NOTE:
NOTE:
The purpose of this method is to simplify batching large operations, such as inserts, updates, etc. You pass in an iterable and the number of items-per-batch, and the items will be returned by an equivalent iterator that wraps each batch in a transaction.
Example:
# Some list or iterable containing data to insert. row_data = [{'username': 'u1'}, {'username': 'u2'}, ...] # Insert all data, committing every 100 rows. If, for example, # there are 789 items in the list, then there will be a total of # 8 transactions (7x100 and 1x89). for row in db.batch_commit(row_data, 100):
User.create(**row)
An alternative that may be more efficient is to batch the data into a multi-value INSERT statement (for example, using Model.insert_many()):
with db.atomic():
for idx in range(0, len(row_data), 100):
# Insert 100 rows at a time.
rows = row_data[idx:idx + 100]
User.insert_many(rows).execute()
Return a list of IndexMetadata tuples.
Example:
print(db.get_indexes('entry')) [IndexMetadata(
name='entry_public_list',
sql='CREATE INDEX "entry_public_list" ...',
columns=['timestamp'],
unique=False,
table='entry'),
IndexMetadata(
name='entry_slug',
sql='CREATE UNIQUE INDEX "entry_slug" ON "entry" ("slug")',
columns=['slug'],
unique=True,
table='entry')]
Return a list of ColumnMetadata tuples.
Example:
print(db.get_columns('entry')) [ColumnMetadata(
name='id',
data_type='INTEGER',
null=False,
primary_key=True,
table='entry'),
ColumnMetadata(
name='title',
data_type='TEXT',
null=False,
primary_key=False,
table='entry'),
...]
Return a list of column names that comprise the primary key.
Example:
print(db.get_primary_keys('entry')) ['id']
Return a list of ForeignKeyMetadata tuples for keys present on the table.
Example:
print(db.get_foreign_keys('entrytag')) [ForeignKeyMetadata(
column='entry_id',
dest_table='entry',
dest_column='id',
table='entrytag'),
...]
Return a list of ViewMetadata tuples for VIEWs present in the database.
Example:
print(db.get_views()) [ViewMetadata(
name='entries_public',
sql='CREATE VIEW entries_public AS SELECT ... '),
...]
Create tables, indexes and associated metadata for the given list of models.
Dependencies are resolved so that tables are created in the appropriate order.
Drop tables, indexes and associated metadata for the given list of models.
Dependencies are resolved so that tables are dropped in the appropriate order.
Bind the given list of models, and specified relations, to the database.
Create a context-manager that binds (associates) the given models with the current database for the duration of the wrapped block.
Example:
MODELS = (User, Account, Note) # Bind the given models to the db for the duration of wrapped block. def use_test_database(fn):
@wraps(fn)
def inner(self):
with test_db.bind_ctx(MODELS):
test_db.create_tables(MODELS)
try:
fn(self)
finally:
test_db.drop_tables(MODELS)
return inner class TestSomething(TestCase):
@use_test_database
def test_something(self):
# ... models are bound to test database ...
pass
Provides a compatible interface for extracting a portion of a datetime.
Provides a compatible interface for truncating a datetime to the given resolution.
A compatible interface for calling the appropriate random number generation function provided by the database. For Postgres and Sqlite, this is equivalent to fn.random(), for MySQL fn.rand().
Sqlite database implementation. SqliteDatabase that provides some advanced features only offered by Sqlite.
Example of initializing a database and configuring some PRAGMAs:
db = SqliteDatabase('my_app.db', pragmas=(
('cache_size', -16000), # 16MB
('journal_mode', 'wal'), # Use write-ahead-log journal mode. )) # Alternatively, pragmas can be specified using a dictionary. db = SqliteDatabase('my_app.db', pragmas={'journal_mode': 'wal'})
Execute a PRAGMA query once on the active connection. If a value is not specified, then the current value will be returned.
If permanent is specified, then the PRAGMA query will also be executed whenever a new connection is opened, ensuring it is always in-effect.
NOTE:
Register a user-defined aggregate function.
The function will be registered each time a new connection is opened. Additionally, if a connection is already open, the aggregate will be registered with the open connection.
Class decorator to register a user-defined aggregate function.
Example:
@db.aggregate('md5') class MD5(object):
def initialize(self):
self.md5 = hashlib.md5()
def step(self, value):
self.md5.update(value)
def finalize(self):
return self.md5.hexdigest() @db.aggregate() class Product(object):
'''Like SUM() except calculates cumulative product.'''
def __init__(self):
self.product = 1
def step(self, value):
self.product *= value
def finalize(self):
return self.product
Register a user-defined collation. The collation will be registered each time a new connection is opened. Additionally, if a connection is already open, the collation will be registered with the open connection.
Decorator to register a user-defined collation.
Example:
@db.collation('reverse') def collate_reverse(s1, s2):
return -cmp(s1, s2) # Usage: Book.select().order_by(collate_reverse.collation(Book.title)) # Equivalent: Book.select().order_by(Book.title.asc(collation='reverse'))
As you might have noticed, the original collate_reverse function has a special attribute called collation attached to it. This extra attribute provides a shorthand way to generate the SQL necessary to use our custom collation.
Register a user-defined scalar function. The function will be registered each time a new connection is opened. Additionally, if a connection is already open, the function will be registered with the open connection.
Decorator to register a user-defined scalar function.
Example:
@db.func('title_case') def title_case(s):
return s.title() if s else '' # Usage: title_case_books = Book.select(fn.title_case(Book.title))
Register a user-defined window function.
ATTENTION:
The window function will be registered each time a new connection is opened. Additionally, if a connection is already open, the window function will be registered with the open connection.
Class decorator to register a user-defined window function. Window functions must define the following methods:
Example:
@db.window_function('my_sum') class MySum(object):
def __init__(self):
self._value = 0
def step(self, value):
self._value += value
def inverse(self, value):
self._value -= value
def value(self):
return self._value
def finalize(self):
return self._value
Example:
from playhouse.sqlite_ext import TableFunction @db.table_function('series') class Series(TableFunction):
columns = ['value']
params = ['start', 'stop', 'step']
def initialize(self, start=0, stop=None, step=1):
"""
Table-functions declare an initialize() method, which is
called with whatever arguments the user has called the
function with.
"""
self.start = self.current = start
self.stop = stop or float('Inf')
self.step = step
def iterate(self, idx):
"""
Iterate is called repeatedly by the SQLite database engine
until the required number of rows has been read **or** the
function raises a `StopIteration` signalling no more rows
are available.
"""
if self.current > self.stop:
raise StopIteration
ret, self.current = self.current, self.current + self.step
return (ret,) # Usage: cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2)) for value, in cursor:
print(value) # Prints: # 0 # 2 # 4
Unregister the user-defined aggregate function.
Unregister the user-defined scalar function.
Unregister the user-defined scalar function.
For example, if you've compiled the closure table extension and wish to use it in your application, you might write:
db = SqliteExtDatabase('my_app.db') db.load_extension('closure')
Register another database file that will be attached to every database connection. If the main database is currently connected, the new database will be attached on the open connection.
NOTE:
Unregister another database file that was attached previously with a call to attach(). If the main database is currently connected, the attached database will be detached from the open connection.
Create a transaction context-manager using the specified locking strategy (defaults to DEFERRED).
Additional optional keyword-parameters:
Set the timezone on the current connection. If no connection is open, then one will be opened.
query = MyModel.select() new_query = query.where(MyModel.field == 'value')
User = Table('users') query = (User
.select(User.c.username)
.where(User.c.active == True)
.order_by(User.c.username))
Create a Select query on the table. If the table explicitly declares columns and no columns are provided, then by default all the table's defined columns will be selected.
Join type may be one of:
Convenience method for calling join() using a LEFT OUTER join.
NOTE:
When columns are not explicitly defined, tables have a special attribute "c" which is a factory that provides access to table columns dynamically.
Example:
User = Table('users') query = (User
.select(User.c.id, User.c.username)
.order_by(User.c.username))
Equivalent example when columns are specified:
User = Table('users', ('id', 'username')) query = (User
.select(User.id, User.username)
.order_by(User.username))
Bind this table to the given database (or unbind by leaving empty).
When a table is bound to a database, queries may be executed against it without the need to specify the database in the query's execute method.
Return a context manager that will bind the table to the given database for the duration of the wrapped block.
Create a Select query on the table. If the table explicitly declares columns and no columns are provided, then by default all the table's defined columns will be selected.
Example:
User = Table('users', ('id', 'username')) # Because columns were defined on the Table, we will default to # selecting both of the User table's columns. # Evaluates to SELECT id, username FROM users query = User.select() Note = Table('notes') query = (Note
.select(Note.c.content, Note.c.timestamp, User.username)
.join(User, on=(Note.c.user_id == User.id))
.where(Note.c.is_published == True)
.order_by(Note.c.timestamp.desc())) # Using a function to select users and the number of notes they # have authored. query = (User
.select(
User.username,
fn.COUNT(Note.c.id).alias('n_notes'))
.join(
Note,
JOIN.LEFT_OUTER,
on=(User.id == Note.c.user_id))
.order_by(fn.COUNT(Note.c.id).desc()))
Create a Insert query into the table.
Create a Insert query into the table whose conflict resolution method is to replace.
Create a Update query for the table.
Specify the predicate expression used for this join.
Example:
data = [(1, 'first'), (2, 'second')] vl = ValuesList(data, columns=('idx', 'name')) query = (vl
.select(vl.c.idx, vl.c.name)
.order_by(vl.c.idx)) # Yields: # SELECT t1.idx, t1.name # FROM (VALUES (1, 'first'), (2, 'second')) AS t1(idx, name) # ORDER BY t1.idx
Example:
vl = ValuesList([(1, 'first'), (2, 'second')]) vl = vl.columns('idx', 'name').alias('v') query = vl.select(vl.c.idx, vl.c.name) # Yields: # SELECT v.idx, v.name # FROM (VALUES (1, 'first'), (2, 'second')) AS v(idx, name)
Column-like objects can be composed using various operators and special methods.
Indicate the alias that should be given to the specified column-like object.
Column on a table or a column returned by a sub-query.
Create a named alias for the given column-like object.
Create a new Alias for the aliased column-like object. If the new alias is None, then the original column-like object is returned.
Value to be used in a parameterized query. It is the responsibility of the caller to ensure that the value passed in can be adapted to a type the database driver understands.
Represents a CAST(<node> AS <cast>) expression.
Represent ordering by a column-like object.
Postgresql supports a non-standard clause ("NULLS FIRST/LAST"). Peewee will automatically use an equivalent CASE statement for databases that do not support this (Sqlite / MySQL).
Represent a binary expression of the form (lhs op rhs), e.g. (foo + 1).
Represent a quoted entity in a query, such as a table, column, alias. The name may consist of multiple components, e.g. "a_table"."column_name".
Represent a parameterized SQL query or query-fragment.
Represent an arbitrary SQL function call.
NOTE:
Example of using fn to call an arbitrary SQL function:
# Query users and count of tweets authored. query = (User
.select(User.username, fn.COUNT(Tweet.id).alias('ct'))
.join(Tweet, JOIN.LEFT_OUTER, on=(User.id == Tweet.user_id))
.group_by(User.username)
.order_by(fn.COUNT(Tweet.id).desc()))
NOTE:
Examples:
# Using a simple partition on a single column. query = (Sample
.select(
Sample.counter,
Sample.value,
fn.AVG(Sample.value).over([Sample.counter]))
.order_by(Sample.counter)) # Equivalent example Using a Window() instance instead. window = Window(partition_by=[Sample.counter]) query = (Sample
.select(
Sample.counter,
Sample.value,
fn.AVG(Sample.value).over(window))
.window(window) # Note call to ".window()"
.order_by(Sample.counter)) # Example using bounded window. query = (Sample
.select(Sample.value,
fn.SUM(Sample.value).over(
partition_by=[Sample.counter],
start=Window.CURRENT_ROW, # current row
end=Window.following())) # unbounded following
.order_by(Sample.id))
Add a FILTER (WHERE...) clause to an aggregate function. The where expression is evaluated to determine which rows are fed to the aggregate function. This SQL feature is supported for Postgres and SQLite.
When coerce is True, the target data-type is inferred using several heuristics. Read the source for BaseModelCursorWrapper._initialize_columns method to see how this works.
Specify a particular function to use when converting values returned by the database cursor. For example:
# Get user and a list of their tweet IDs. The tweet IDs are # returned as a comma-separated string by the db, so we'll split # the result string and convert the values to python ints. tweet_ids = (fn
.GROUP_CONCAT(Tweet.id)
.python_value(lambda idlist: [int(i) for i in idlist])) query = (User
.select(User.username, tweet_ids.alias('tweet_ids'))
.group_by(User.username)) for user in query:
print(user.username, user.tweet_ids) # e.g., # huey [1, 4, 5, 7] # mickey [2, 3, 6] # zaizee []
To create a node representative of a SQL function call, use the function name as an attribute on fn and then provide the arguments as you would if calling a Python function:
# List users and the number of tweets they have authored, # from highest-to-lowest: sql_count = fn.COUNT(Tweet.id) query = (User
.select(User, sql_count.alias('count'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User)
.order_by(sql_count.desc())) # Get the timestamp of the most recent tweet: query = Tweet.select(fn.MAX(Tweet.timestamp)) max_timestamp = query.scalar() # Retrieve scalar result from query.
Function calls can, like anything else, be composed and nested:
# Get users whose username begins with "A" or "a": a_users = User.select().where(fn.LOWER(fn.SUBSTR(User.username, 1, 1)) == 'a')
Represent a WINDOW clause.
NOTE:
Convenience method for generating SQL suitable for passing in as the start parameter for a window range.
Convenience method for generating SQL suitable for passing in as the end parameter for a window range.
Examples:
Number = Table('numbers', ('val',)) num_as_str = Case(Number.val, (
(1, 'one'),
(2, 'two'),
(3, 'three')), 'a lot') query = Number.select(Number.val, num_as_str.alias('num_str')) # The above is equivalent to: # SELECT "val", # CASE "val" # WHEN 1 THEN 'one' # WHEN 2 THEN 'two' # WHEN 3 THEN 'three' # ELSE 'a lot' END AS "num_str" # FROM "numbers" num_as_str = Case(None, (
(Number.val == 1, 'one'),
(Number.val == 2, 'two'),
(Number.val == 3, 'three')), 'a lot') query = Number.select(Number.val, num_as_str.alias('num_str')) # The above is equivalent to: # SELECT "val", # CASE # WHEN "val" = 1 THEN 'one' # WHEN "val" = 2 THEN 'two' # WHEN "val" = 3 THEN 'three' # ELSE 'a lot' END AS "num_str" # FROM "numbers"
Represent a list of nodes, a multi-part clause, a list of parameters, etc.
Represent a list of nodes joined by commas.
Represent a list of nodes joined by commas and wrapped in parentheses.
Represent a composable Django-style filter expression suitable for use with the Model.filter() or ModelSelect.filter() methods.
Represent a conflict resolution clause for a data-modification query.
Depending on the database-driver being used, one or more of the above parameters may be required.
The update() method supports being called with either a dictionary of column-to-value, or keyword arguments representing the same.
Example:
class KV(Model):
key = CharField(unique=True)
value = IntegerField() # Create one row. KV.create(key='k1', value=1) # Demonstrate usage of EXCLUDED. # Here we will attempt to insert a new value for a given key. If that # key already exists, then we will update its value with the *sum* of its # original value and the value we attempted to insert -- provided that # the new value is larger than the original value. query = (KV.insert(key='k1', value=10)
.on_conflict(conflict_target=[KV.key],
update={KV.value: KV.value + EXCLUDED.value},
where=(EXCLUDED.value > KV.value))) # Executing the above query will result in the following data being # present in the "kv" table: # (key='k1', value=11) query.execute() # If we attempted to execute the query *again*, then nothing would be # updated, as the new value (10) is now less than the value in the # original row (11).
Bind the query to the given database for execution.
Return rows as dictionaries.
Return rows as named tuples.
Return rows as arbitrary objects using the given constructor.
Execute the query and return result (depends on type of query being executed). For example, select queries the return result will be an iterator over the query results.
Execute the query and return an iterator over the result-set. For large result-sets this method is preferable as rows are not cached in-memory during iteration.
NOTE:
Example:
query = StatTbl.select().order_by(StatTbl.timestamp).tuples() for row in query.iterator(db):
process_row(row)
Unlike iterator(), this method will cause rows to be cached in order to allow efficient iteration, indexing and slicing.
Retrieve a row or range of rows from the result-set.
WARNING:
Create a query by directly specifying the SQL to execute.
Base-class for queries that support method-chaining APIs.
Include the given common-table expressions in the query. Any previously specified CTEs will be overwritten. For examples of common-table expressions, see cte.
Include the given expressions in the WHERE clause of the query. The expressions will be AND-ed together with any previously-specified WHERE expressions.
Example selection users where the username is equal to 'somebody':
sq = User.select().where(User.username == 'somebody')
Example selecting tweets made by users who are either editors or administrators:
sq = Tweet.select().join(User).where(
(User.is_editor == True) |
(User.is_admin == True))
Example of deleting tweets by users who are no longer active:
inactive_users = User.select().where(User.active == False) dq = (Tweet
.delete()
.where(Tweet.user.in_(inactive_users))) dq.execute() # Return number of tweets deleted.
NOTE:
Include the given expressions in the WHERE clause of the query. This method is the same as the Query.where() method, except that the expressions will be OR-ed together with any previously-specified WHERE expressions.
Define the ORDER BY clause. Any previously-specified values will be overwritten.
Extend any previously-specified ORDER BY clause with the given values.
Convenience method for specifying the LIMIT and OFFSET in a more intuitive way.
This feature is designed with web-site pagination in mind, so the first page starts with page=1.
Indicate that a query will be used as a common table expression. For example, if we are modelling a category tree and are using a parent-link foreign key, we can retrieve all categories and their absolute depths using a recursive CTE:
class Category(Model):
name = TextField()
parent = ForeignKeyField('self', backref='children', null=True) # The base case of our recursive CTE will be categories that are at # the root level -- in other words, categories without parents. roots = (Category
.select(Category.name, Value(0).alias('level'))
.where(Category.parent.is_null())
.cte(name='roots', recursive=True)) # The recursive term will select the category name and increment # the depth, joining on the base term so that the recursive term # consists of all children of the base category. RTerm = Category.alias() recursive = (RTerm
.select(RTerm.name, (roots.c.level + 1).alias('level'))
.join(roots, on=(RTerm.parent == roots.c.id))) # Express <base term> UNION ALL <recursive term>. cte = roots.union_all(recursive) # Select name and level from the recursive CTE. query = (cte
.select_from(cte.c.name, cte.c.level)
.order_by(cte.c.name)) for category in query:
print(category.name, category.level)
For more examples of CTEs, see cte.
Create a new query that wraps the current (calling) query. For example, suppose you have a simple UNION query, and need to apply an aggregation on the union result-set. To do this, you need to write something like:
SELECT "u"."owner", COUNT("u"."id") AS "ct" FROM (
SELECT "id", "owner", ... FROM "cars"
UNION
SELECT "id", "owner", ... FROM "motorcycles"
UNION
SELECT "id", "owner", ... FROM "boats") AS "u" GROUP BY "u"."owner"
The select_from() method is designed to simplify constructing this type of query.
Example peewee code:
class Car(Model):
owner = ForeignKeyField(Owner, backref='cars')
# ... car-specific fields, etc ... class Motorcycle(Model):
owner = ForeignKeyField(Owner, backref='motorcycles')
# ... motorcycle-specific fields, etc ... class Boat(Model):
owner = ForeignKeyField(Owner, backref='boats')
# ... boat-specific fields, etc ... cars = Car.select(Car.owner) motorcycles = Motorcycle.select(Motorcycle.owner) boats = Boat.select(Boat.owner) union = cars | motorcycles | boats query = (union
.select_from(union.c.owner, fn.COUNT(union.c.id))
.group_by(union.c.owner))
Execute the query and return the given number of rows from the start of the cursor. This function may be called multiple times safely, and will always return the first N rows of results.
Like the peek() method, except a LIMIT is applied to the query to ensure that only n rows are returned. Multiple calls for the same value of n will not result in multiple executions.
Return a scalar value from the first row of results. If multiple scalar values are anticipated (e.g. multiple aggregations in a single query) then you may specify as_tuple=True to get the row tuple.
Example:
query = Note.select(fn.MAX(Note.timestamp)) max_ts = query.scalar(db) query = Note.select(fn.MAX(Note.timestamp), fn.COUNT(Note.id)) max_ts, n_notes = query.scalar(db, as_tuple=True)
Return number of rows in the query result-set.
Implemented by running SELECT COUNT(1) FROM (<current query>).
Return a boolean indicating whether the current query has any results.
Execute the query and return the first row, if it exists. Multiple calls will result in multiple queries being executed.
Class representing a compound SELECT query.
Class representing a SELECT query.
NOTE:
Methods on the select query can be chained together.
Example selecting some user instances from the database. Only the id and username columns are selected. When iterated, will return instances of the User model:
query = User.select(User.id, User.username) for user in query:
print(user.username)
Example selecting users and additionally the number of tweets made by the user. The User instances returned will have an additional attribute, 'count', that corresponds to the number of tweets made:
query = (User
.select(User, fn.COUNT(Tweet.id).alias('count'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User)) for user in query:
print(user.username, 'has tweeted', user.count, 'times')
NOTE:
Specify which columns or column-like values to SELECT.
Same as Select.columns(), provided for backwards-compatibility.
Extend the current selection with the given columns.
Example:
def get_users(with_count=False):
query = User.select()
if with_count:
query = (query
.select_extend(fn.COUNT(Tweet.id).alias('count'))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User))
return query
Specify which table-like objects should be used in the FROM clause.
User = Table('users') Tweet = Table('tweets') query = (User
.select(User.c.username, Tweet.c.content)
.from_(User, Tweet)
.where(User.c.id == Tweet.c.user_id)) for row in query.execute(db):
print(row['username'], '->', row['content'])
Join type may be one of:
Express a JOIN:
User = Table('users', ('id', 'username')) Note = Table('notes', ('id', 'user_id', 'content')) query = (Note
.select(Note.content, User.username)
.join(User, on=(Note.user_id == User.id)))
Define the GROUP BY clause. Any previously-specified values will be overwritten.
Additionally, to specify all columns on a given table, you can pass the table/model object in place of the individual columns.
Example:
query = (User
.select(User, fn.Count(Tweet.id).alias('count'))
.join(Tweet)
.group_by(User))
Extend the GROUP BY clause with the given columns.
Include the given expressions in the HAVING clause of the query. The expressions will be AND-ed together with any previously-specified HAVING expressions.
Indicate whether this query should use a DISTINCT clause. By specifying a single value of True the query will use a simple SELECT DISTINCT. Specifying one or more columns will result in a SELECT DISTINCT ON.
Define the WINDOW clause. Any previously-specified values will be overwritten.
Example:
# Equivalent example Using a Window() instance instead. window = Window(partition_by=[Sample.counter]) query = (Sample
.select(
Sample.counter,
Sample.value,
fn.AVG(Sample.value).over(window))
.window(window) # Note call to ".window()"
.order_by(Sample.counter))
Base-class for write queries.
Specify the RETURNING clause of query (if supported by your database).
query = (User
.insert_many([{'username': 'foo'},
{'username': 'bar'},
{'username': 'baz'}])
.returning(User.id, User.username)
.namedtuples()) data = query.execute() for row in data:
print('added:', row.username, 'with id=', row.id)
Class representing an UPDATE query.
Example:
PageView = Table('page_views') query = (PageView
.update({PageView.c.page_views: PageView.c.page_views + 1})
.where(PageView.c.url == url)) query.execute(database)
Specify additional tables to join with using the UPDATE ... FROM syntax, which is supported by Postgres. The Postgres documentation provides additional detail, but to summarize:
Example:
# Update multiple users in a single query. data = [('huey', True),
('mickey', False),
('zaizee', True)] vl = ValuesList(data, columns=('username', 'is_admin'), alias='vl') # Here we'll update the "is_admin" status of the above users, # "joining" the VALUES() on the "username" column. query = (User
.update(is_admin=vl.c.is_admin)
.from_(vl)
.where(User.username == vl.c.username))
The above query produces the following SQL:
UPDATE "users" SET "is_admin" = "vl"."is_admin" FROM (
VALUES ('huey', t), ('mickey', f), ('zaizee', t))
AS "vl"("username", "is_admin") WHERE ("users"."username" = "vl"."username")
Class representing an INSERT query.
Specify IGNORE conflict resolution strategy.
Specify REPLACE conflict resolution strategy.
Specify the parameters for an OnConflict clause to use for conflict resolution.
Examples:
class User(Model):
username = TextField(unique=True)
last_login = DateTimeField(null=True)
login_count = IntegerField() def log_user_in(username):
now = datetime.datetime.now()
# INSERT a new row for the user with the current timestamp and
# login count set to 1. If the user already exists, then we
# will preserve the last_login value from the "insert()" clause
# and atomically increment the login-count.
userid = (User
.insert(username=username, last_login=now, login_count=1)
.on_conflict(
conflict_target=[User.username],
preserve=[User.last_login],
update={User.login_count: User.login_count + 1})
.execute())
return userid
Example using the special EXCLUDED namespace:
class KV(Model):
key = CharField(unique=True)
value = IntegerField() # Create one row. KV.create(key='k1', value=1) # Demonstrate usage of EXCLUDED. # Here we will attempt to insert a new value for a given key. If that # key already exists, then we will update its value with the *sum* of its # original value and the value we attempted to insert -- provided that # the new value is larger than the original value. query = (KV.insert(key='k1', value=10)
.on_conflict(conflict_target=[KV.key],
update={KV.value: KV.value + EXCLUDED.value},
where=(EXCLUDED.value > KV.value))) # Executing the above query will result in the following data being # present in the "kv" table: # (key='k1', value=11) query.execute() # If we attempted to execute the query *again*, then nothing would be # updated, as the new value (10) is now less than the value in the # original row (11).
Include the given expressions in the WHERE clause of the index. The expressions will be AND-ed together with any previously-specified WHERE expressions.
Expressive method for declaring an index on a model.
Examples:
class Article(Model):
name = TextField()
timestamp = TimestampField()
status = IntegerField()
flags = BitField()
is_sticky = flags.flag(1)
is_favorite = flags.flag(2) # CREATE INDEX ... ON "article" ("name", "timestamp") idx = ModelIndex(Article, (Article.name, Article.timestamp)) # CREATE INDEX ... ON "article" ("name", "timestamp") WHERE "status" = 1 idx = idx.where(Article.status == 1) # CREATE UNIQUE INDEX ... ON "article" ("timestamp" DESC, "flags" & 2) WHERE "status" = 1 idx = ModelIndex(
Article,
(Article.timestamp.desc(), Article.flags.bin_and(2)),
unique = True).where(Article.status == 1)
You can also use Model.index():
idx = Article.index(Article.name, Article.timestamp).where(Article.status == 1)
To add an index to a model definition use Model.add_index():
idx = Article.index(Article.name, Article.timestamp).where(Article.status == 1) # Add above index definition to the model definition. When you call # Article.create_table() (or database.create_tables([Article])), the # index will be created. Article.add_index(idx)
Fields on a Model are analogous to columns on a table.
NOTE:
Field class for storing auto-incrementing primary keys using the new Postgres 10 IDENTITY column type. The column definition ends up looking like this:
id = IdentityField() # "id" INT GENERATED BY DEFAULT AS IDENTITY NOT NULL PRIMARY KEY
ATTENTION:
Defaults to decimal.DefaultContext.rounding.
Field class for storing decimal numbers. Values are represented as decimal.Decimal objects.
NOTE:
NOTE:
Usage:
class Post(Model):
content = TextField()
flags = BitField()
is_favorite = flags.flag(1)
is_sticky = flags.flag(2)
is_minimized = flags.flag(4)
is_deleted = flags.flag(8) >>> p = Post() >>> p.is_sticky = True >>> p.is_minimized = True >>> print(p.flags) # Prints 4 | 2 --> "6" 6 >>> p.is_favorite False >>> p.is_sticky True
We can use the flags on the Post class to build expressions in queries as well:
# Generates a WHERE clause that looks like: # WHERE (post.flags & 1 != 0) query = Post.select().where(Post.is_favorite) # Query for sticky + favorite posts: query = Post.select().where(Post.is_sticky & Post.is_favorite)
When bulk-updating one or more bits in a BitField, you can use bitwise operators to set or clear one or more bits:
# Set the 4th bit on all Post objects. Post.update(flags=Post.flags | 8).execute() # Clear the 1st and 3rd bits on all Post objects. Post.update(flags=Post.flags & ~(1 | 4)).execute()
For simple operations, the flags provide handy set() and clear() methods for setting or clearing an individual bit:
# Set the "is_deleted" bit on all posts. Post.update(flags=Post.is_deleted.set()).execute() # Clear the "is_deleted" bit on all posts. Post.update(flags=Post.is_deleted.clear()).execute()
Returns a descriptor that can get or set specific bits in the overall value. When accessed on the class itself, it returns a Expression object suitable for use in a query.
If the value is not provided, it is assumed that each flag will be an increasing power of 2, so if you had four flags, they would have the values 1, 2, 4, 8.
Example usage:
class Bitmap(Model):
data = BigBitField() bitmap = Bitmap() # Sets the ith bit, e.g. the 1st bit, the 11th bit, the 63rd, etc. bits_to_set = (1, 11, 63, 31, 55, 48, 100, 99) for bit_idx in bits_to_set:
bitmap.data.set_bit(bit_idx) # We can test whether a bit is set using "is_set": assert bitmap.data.is_set(11) assert not bitmap.data.is_set(12) # We can clear a bit: bitmap.data.clear_bit(11) assert not bitmap.data.is_set(11) # We can also "toggle" a bit. Recall that the 63rd bit was set earlier. assert bitmap.data.toggle_bit(63) is False assert bitmap.data.toggle_bit(63) is True assert bitmap.data.is_set(63)
Clears the idx-th bit in the bitmap.
Toggles the idx-th bit in the bitmap and returns whether the bit is set or not.
Example:
>>> bitmap = Bitmap() >>> bitmap.data.toggle_bit(10) # Toggle the 10th bit. True >>> bitmap.data.toggle_bit(10) # This will clear the 10th bit. False
Returns boolean indicating whether the idx-th bit is set or not.
Field class for storing datetime.datetime objects.
Accepts a special parameter formats, which contains a list of formats the datetime can be encoded with (for databases that do not have support for a native datetime data-type). The default supported formats are:
'%Y-%m-%d %H:%M:%S.%f' # year-month-day hour-minute-second.microsecond '%Y-%m-%d %H:%M:%S' # year-month-day hour-minute-second '%Y-%m-%d' # year-month-day
NOTE:
Blog.select().where(Blog.pub_date.year == 2018)
Example:
# Find all events that are exactly 1 hour long. query = (Event
.select()
.where((Event.start.to_timestamp() + 3600) ==
Event.stop.to_timestamp())
.order_by(Event.start))
Truncates the value in the column to the given part. This method is useful for finding all rows within a given month, for instance.
Field class for storing datetime.date objects.
Accepts a special parameter formats, which contains a list of formats the datetime can be encoded with (for databases that do not have support for a native date data-type). The default supported formats are:
'%Y-%m-%d' # year-month-day '%Y-%m-%d %H:%M:%S' # year-month-day hour-minute-second '%Y-%m-%d %H:%M:%S.%f' # year-month-day hour-minute-second.microsecond
NOTE:
Person.select().where(Person.dob.year == 1983)
Field class for storing datetime.time objects (not timedelta).
Accepts a special parameter formats, which contains a list of formats the datetime can be encoded with (for databases that do not have support for a native time data-type). The default supported formats are:
'%H:%M:%S.%f' # hour:minute:second.microsecond '%H:%M:%S' # hour:minute:second '%H:%M' # hour:minute '%Y-%m-%d %H:%M:%S.%f' # year-month-day hour-minute-second.microsecond '%Y-%m-%d %H:%M:%S' # year-month-day hour-minute-second
NOTE:
evening_events = Event.select().where(Event.time.hour > 17)
Field class for storing date-times as integer timestamps. Sub-second resolution is supported by multiplying by a power of 10 to get an integer.
If the resolution parameter is 0 or 1, then the timestamp is stored using second resolution. A resolution between 2 and 6 is treated as the number of decimal places, e.g. resolution=3 corresponds to milliseconds. Alternatively, the decimal can be provided as a multiple of 10, such that resolution=10 will store 1/10th of a second resolution.
The resolution parameter can be either 0-6 or 10, 100, etc up to 1000000 (for microsecond resolution). This allows sub-second precision while still using an IntegerField for storage. The default is second resolution.
Also accepts a boolean parameter utc, used to indicate whether the timestamps should be UTC. Default is False.
Finally, the field default is the current timestamp. If you do not want this behavior, then explicitly pass in default=None.
Field class that does not specify a data-type (SQLite-only).
Since data-types are not enforced, you can declare fields without any data-type. It is also common for SQLite virtual tables to use meta-columns or untyped columns, so for those cases as well you may wish to use an untyped field.
Accepts a special coerce parameter, a function that takes a value coming from the database and converts it into the appropriate Python type.
Field class for storing a foreign key.
class User(Model):
name = TextField() class Tweet(Model):
user = ForeignKeyField(User, backref='tweets')
content = TextField() # "user" attribute >>> some_tweet.user <User: charlie> # "tweets" backref attribute >>> for tweet in charlie.tweets: ... print(tweet.content) Some tweet Another tweet Yet another tweet
For an in-depth discussion of foreign-keys, joins and relationships between models, refer to relationships.
NOTE:
NOTE:
NOTE:
Field class for representing a deferred foreign key. Useful for circular foreign-key references, for example:
class Husband(Model):
name = TextField()
wife = DeferredForeignKey('Wife', deferrable='INITIALLY DEFERRED') class Wife(Model):
name = TextField()
husband = ForeignKeyField(Husband, deferrable='INITIALLY DEFERRED')
In the above example, when the Wife model is declared, the foreign-key Husband.wife is automatically resolved and turned into a regular ForeignKeyField.
WARNING:
class User(Model):
username = TextField() class Tweet(Model):
# This will never actually be resolved, because the User
# model has already been declared.
user = DeferredForeignKey('user', backref='tweets')
content = TextField()
In cases like these you should use the regular ForeignKeyField or you can manually resolve deferred foreign keys like so:
# Tweet.user will be resolved into a ForeignKeyField: DeferredForeignKey.resolve(User)
The ManyToManyField provides a simple interface for working with many-to-many relationships, inspired by Django. A many-to-many relationship is typically implemented by creating a junction table with foreign keys to the two models being related. For instance, if you were building a syllabus manager for college students, the relationship between students and courses would be many-to-many. Here is the schema using standard APIs:
ATTENTION:
Standard way of declaring a many-to-many relationship (without the use of the ManyToManyField):
class Student(Model):
name = CharField() class Course(Model):
name = CharField() class StudentCourse(Model):
student = ForeignKeyField(Student)
course = ForeignKeyField(Course)
To query the courses for a particular student, you would join through the junction table:
# List the courses that "Huey" is enrolled in: courses = (Course
.select()
.join(StudentCourse)
.join(Student)
.where(Student.name == 'Huey')) for course in courses:
print(course.name)
The ManyToManyField is designed to simplify this use-case by providing a field-like API for querying and modifying data in the junction table. Here is how our code looks using ManyToManyField:
class Student(Model):
name = CharField() class Course(Model):
name = CharField()
students = ManyToManyField(Student, backref='courses')
NOTE:
We still need a junction table to store the relationships between students and courses. This model can be accessed by calling the get_through_model() method. This is useful when creating tables.
# Create tables for the students, courses, and relationships between # the two. db.create_tables([
Student,
Course,
Course.students.get_through_model()])
When accessed from a model instance, the ManyToManyField exposes a ModelSelect representing the set of related objects. Let's use the interactive shell to see how all this works:
>>> huey = Student.get(Student.name == 'huey') >>> [course.name for course in huey.courses] ['English 101', 'CS 101'] >>> engl_101 = Course.get(Course.name == 'English 101') >>> [student.name for student in engl_101.students] ['Huey', 'Mickey', 'Zaizee']
To add new relationships between objects, you can either assign the objects directly to the ManyToManyField attribute, or call the add() method. The difference between the two is that simply assigning will clear out any existing relationships, whereas add() can preserve existing relationships.
>>> huey.courses = Course.select().where(Course.name.contains('english')) >>> for course in huey.courses.order_by(Course.name): ... print course.name English 101 English 151 English 201 English 221 >>> cs_101 = Course.get(Course.name == 'CS 101') >>> cs_151 = Course.get(Course.name == 'CS 151') >>> huey.courses.add([cs_101, cs_151]) >>> [course.name for course in huey.courses.order_by(Course.name)] ['CS 101', 'CS151', 'English 101', 'English 151', 'English 201',
'English 221']
This is quite a few courses, so let's remove the 200-level english courses. To remove objects, use the remove() method.
>>> huey.courses.remove(Course.select().where(Course.name.contains('2')) 2 >>> [course.name for course in huey.courses.order_by(Course.name)] ['CS 101', 'CS151', 'English 101', 'English 151']
To remove all relationships from a collection, you can use the clear() method. Let's say that English 101 is canceled, so we need to remove all the students from it:
>>> engl_101 = Course.get(Course.name == 'English 101') >>> engl_101.students.clear()
NOTE:
Associate value with the current instance. You can pass in a single model instance, a list of model instances, or even a ModelSelect.
Example code:
# Huey needs to enroll in a bunch of courses, including all # the English classes, and a couple Comp-Sci classes. huey = Student.get(Student.name == 'Huey') # We can add all the objects represented by a query. english_courses = Course.select().where(
Course.name.contains('english')) huey.courses.add(english_courses) # We can also add lists of individual objects. cs101 = Course.get(Course.name == 'CS 101') cs151 = Course.get(Course.name == 'CS 151') huey.courses.add([cs101, cs151])
Disassociate value from the current instance. Like add(), you can pass in a model instance, a list of model instances, or even a ModelSelect.
Example code:
# Huey is currently enrolled in a lot of english classes # as well as some Comp-Sci. He is changing majors, so we # will remove all his courses. english_courses = Course.select().where(
Course.name.contains('english')) huey.courses.remove(english_courses) # Remove the two Comp-Sci classes Huey is enrolled in. cs101 = Course.get(Course.name == 'CS 101') cs151 = Course.get(Course.name == 'CS 151') huey.courses.remove([cs101, cs151])
Example code:
# English 101 is canceled this semester, so remove all # the enrollments. english_101 = Course.get(Course.name == 'English 101') english_101.students.clear()
When creating tables for an application that uses ManyToManyField, you must create the through table expicitly.
# Get a reference to the automatically-created through table. StudentCourseThrough = Course.students.get_through_model() # Create tables for our two models as well as the through model. db.create_tables([
Student,
Course,
StudentCourseThrough])
Example:
class Note(BaseModel):
content = TextField() NoteThroughDeferred = DeferredThroughModel() class User(BaseModel):
username = TextField()
notes = ManyToManyField(Note, through_model=NoteThroughDeferred) # Cannot declare this before "User" since it has a foreign-key to # the User model. class NoteThrough(BaseModel):
note = ForeignKeyField(Note)
user = ForeignKeyField(User) # Resolve dependencies. NoteThroughDeferred.set_model(NoteThrough)
A primary key composed of multiple columns. Unlike the other fields, a composite key is defined in the model's Meta class after the fields have been defined. It takes as parameters the string names of the fields to use as the primary key:
class BlogTagThrough(Model):
blog = ForeignKeyField(Blog, backref='tags')
tag = ForeignKeyField(Tag, backref='blogs')
class Meta:
primary_key = CompositeKey('blog', 'tag')
Provides methods for managing the creation and deletion of tables and indexes for the given model.
Execute CREATE TABLE query for the given model.
Execute DROP TABLE query for the given model.
Execute TRUNCATE TABLE for the given model. If the database is Sqlite, which does not support TRUNCATE, then an equivalent DELETE query will be executed.
Execute CREATE INDEX queries for the indexes defined for the model.
Execute DROP INDEX queries for the indexes defined for the model.
Create sequence for the given Field.
Drop sequence for the given Field.
Add a foreign-key constraint for the given field. This method should not be necessary in most cases, as foreign-key constraints are created as part of table creation. The exception is when you are creating a circular foreign-key relationship using DeferredForeignKey. In those cases, it is necessary to first create the tables, then add the constraint for the deferred foreign-key:
class Language(Model):
name = TextField()
selected_snippet = DeferredForeignKey('Snippet') class Snippet(Model):
code = TextField()
language = ForeignKeyField(Language, backref='snippets') # Creates both tables but does not create the constraint for the # Language.selected_snippet foreign key (because of the circular # dependency). db.create_tables([Language, Snippet]) # Explicitly create the constraint: Language._schema.create_foreign_key(Language.selected_snippet)
For more information, see documentation on circular-fks.
WARNING:
Create sequence(s), index(es) and table for the model.
Drop table for the model and associated indexes.
Store metadata for a Model.
This class should not be instantiated directly, but is instantiated using the attributes of a Model class' inner Meta class. Metadata attributes are then available on Model._meta.
Traverse the model graph and return a list of 3-tuples, consisting of (foreign key field, model class, is_backref).
Bind the model class to the given Database instance.
WARNING:
Bind the model class to the given table name at run-time.
Model class provides a high-level abstraction for working with database tables. Models are a one-to-one mapping with a database table (or a table-like object, such as a view). Subclasses of Model declare any number of Field instances as class attributes. These fields correspond to columns on the table.
Table-level operations, such as select(), update(), insert() and delete() are implemented as classmethods. Row-level operations, such as save() and delete_instance() are implemented as instancemethods.
Example:
db = SqliteDatabase(':memory:') class User(Model):
username = TextField()
join_date = DateTimeField(default=datetime.datetime.now)
is_admin = BooleanField(default=False) admin = User(username='admin', is_admin=True) admin.save()
Create an alias to the model-class. Model aliases allow you to reference the same Model multiple times in a query, for example when doing a self-join or sub-query.
Example:
Parent = Category.alias() sq = (Category
.select(Category, Parent)
.join(Parent, on=(Category.parent == Parent.id))
.where(Parent.name == 'parent category'))
Create a SELECT query. If no fields are explicitly provided, the query will by default SELECT all the fields defined on the model, unless you are using the query as a sub-query, in which case only the primary key will be selected by default.
Example of selecting all columns:
query = User.select().where(User.active == True).order_by(User.username)
Example of selecting all columns on Tweet and the parent model, User. When the user foreign key is accessed on a Tweet instance no additional query will be needed (see N+1 for more details):
query = (Tweet
.select(Tweet, User)
.join(User)
.order_by(Tweet.created_date.desc())) for tweet in query:
print(tweet.user.username, '->', tweet.content)
Example of subquery only selecting the primary key:
inactive_users = User.select().where(User.active == False) # Here, instead of defaulting to all columns, Peewee will default # to only selecting the primary key. Tweet.delete().where(Tweet.user.in_(inactive_users)).execute()
Create an UPDATE query.
Example showing users being marked inactive if their registration has expired:
q = (User
.update({User.active: False})
.where(User.registration_expired == True)) q.execute() # Execute the query, returning number of rows updated.
Example showing an atomic update:
q = (PageView
.update({PageView.count: PageView.count + 1})
.where(PageView.url == url)) q.execute() # Execute the query.
NOTE:
Create an INSERT query.
Insert a new row into the database. If any fields on the model have default values, these values will be used if the fields are not explicitly set in the insert dictionary.
Example showing creation of a new user:
q = User.insert(username='admin', active=True, registration_expired=False) q.execute() # perform the insert.
You can also use Field objects as the keys:
new_id = User.insert({User.username: 'admin'}).execute()
If you have a model with a default value on one of the fields, and that field is not specified in the insert parameter, the default will be used:
class User(Model):
username = CharField()
active = BooleanField(default=True) # This INSERT query will automatically specify `active=True`: User.insert(username='charlie')
NOTE:
INSERT multiple rows of data.
The rows parameter must be an iterable that yields dictionaries or tuples, where the ordering of the tuple values corresponds to the fields specified in the fields argument. As with insert(), fields that are not specified in the dictionary will use their default value, if one exists.
NOTE:
Person.insert_many([
{'first_name': 'Peewee', 'last_name': 'Herman'},
{'first_name': 'Huey'}, # Missing "last_name"! ]).execute()
Example of inserting multiple Users:
data = [
('charlie', True),
('huey', False),
('zaizee', False)] query = User.insert_many(data, fields=[User.username, User.is_admin]) query.execute()
Equivalent example using dictionaries:
data = [
{'username': 'charlie', 'is_admin': True},
{'username': 'huey', 'is_admin': False},
{'username': 'zaizee', 'is_admin': False}] # Insert new rows. User.insert_many(data).execute()
Because the rows parameter can be an arbitrary iterable, you can also use a generator:
def get_usernames():
for username in ['charlie', 'huey', 'peewee']:
yield {'username': username} User.insert_many(get_usernames()).execute()
WARNING:
NOTE:
For more information, check out the following SQLite documents:
NOTE:
INSERT data using a SELECT query as the source. This API should be used for queries of the form INSERT INTO ... SELECT FROM ....
Example of inserting data across tables for denormalization purposes:
source = (User
.select(User.username, fn.COUNT(Tweet.id))
.join(Tweet, JOIN.LEFT_OUTER)
.group_by(User.username)) UserTweetDenorm.insert_from(
source,
[UserTweetDenorm.username, UserTweetDenorm.num_tweets]).execute()
NOTE:
Create an INSERT query that uses REPLACE for conflict-resolution.
See Model.insert() for examples.
INSERT multiple rows of data using REPLACE for conflict-resolution.
See Model.insert_many() for examples.
Execute a SQL query directly.
Example selecting rows from the User table:
q = User.raw('select id, username from users') for user in q:
print(user.id, user.username)
NOTE:
Example showing the deletion of all inactive users:
q = User.delete().where(User.active == False) q.execute() # Remove the rows, return number of rows removed.
WARNING:
INSERT new row into table and return corresponding model instance.
Example showing the creation of a user (a row will be added to the database):
user = User.create(username='admin', password='test')
NOTE:
Efficiently INSERT multiple unsaved model instances into the database. Unlike insert_many(), which accepts row data as a list of either dictionaries or lists, this method accepts a list of unsaved model instances.
Example:
# List of 10 unsaved users. user_list = [User(username='u%s' % i) for i in range(10)] # All 10 users are inserted in a single query. User.bulk_create(user_list)
Batches:
user_list = [User(username='u%s' % i) for i in range(10)] with database.atomic():
# Will execute 4 INSERT queries (3 batches of 3, 1 batch of 1).
User.bulk_create(user_list, batch_size=3)
WARNING:
Efficiently UPDATE multiple model instances.
Example:
# First, create 3 users. u1, u2, u3 = [User.create(username='u%s' % i) for i in (1, 2, 3)] # Now let's modify their usernames. u1.username = 'u1-x' u2.username = 'u2-y' u3.username = 'u3-z' # Update all three rows using a single UPDATE query. User.bulk_update([u1, u2, u3], fields=[User.username])
If you have a large number of objects to update, it is strongly recommended that you specify a batch_size and wrap the operation in a transaction:
with database.atomic():
User.bulk_update(user_list, fields=['username'], batch_size=50)
WARNING:
Retrieve a single model instance matching the given filters. If no model is returned, a DoesNotExist is raised.
user = User.get(User.username == username, User.active == True)
This method is also exposed via the SelectQuery, though it takes no parameters:
active = User.select().where(User.active == True) try:
user = active.where(
(User.username == username) &
(User.active == True)
).get() except User.DoesNotExist:
user = None
NOTE:
Short-hand for calling Model.get() specifying a lookup by primary key. Raises a DoesNotExist if instance with the given primary key value does not exist.
Example:
user = User.get_by_id(1) # Returns user with id = 1.
Short-hand for updating the data with the given primary-key. If no row exists with the given primary key, no exception will be raised.
Example:
# Set "is_admin" to True on user with id=3. User.set_by_id(3, {'is_admin': True})
Short-hand for deleting the row with the given primary-key. If no row exists with the given primary key, no exception will be raised.
Attempt to get the row matching the given filters. If no matching row is found, create a new row.
WARNING:
Example without get_or_create:
# Without `get_or_create`, we might write: try:
person = Person.get(
(Person.first_name == 'John') &
(Person.last_name == 'Lennon')) except Person.DoesNotExist:
person = Person.create(
first_name='John',
last_name='Lennon',
birthday=datetime.date(1940, 10, 9))
Equivalent code using get_or_create:
person, created = Person.get_or_create(
first_name='John',
last_name='Lennon',
defaults={'birthday': datetime.date(1940, 10, 9)})
Save the data in the model instance. By default, the presence of a primary-key value will cause an UPDATE query to be executed.
Example showing saving a model instance:
user = User() user.username = 'some-user' # does not touch the database user.save() # change is persisted to the db
NOTE:
If you always want to only save a model's dirty fields, you can use the Meta option only_save_dirty = True. Then, any time you call Model.save(), by default only the dirty fields will be saved, e.g.
class Person(Model):
first_name = CharField()
last_name = CharField()
dob = DateField()
class Meta:
database = db
only_save_dirty = True
WARNING:
Delete the given instance. Any foreign keys set to cascade on delete will be deleted automatically. For more programmatic control, you can specify recursive=True, which will delete any non-nullable related models (those that are nullable will be set to NULL). If you wish to delete all dependencies regardless of whether they are nullable, set delete_nullable=True.
example:
some_obj.delete_instance() # it is gone forever
Bind the model (and specified relations) to the given database.
See also: Database.bind().
See also: Database.bind_ctx().
Create the model table, indexes, constraints and sequences.
Example:
with database:
SomeModel.create_table() # Execute the create table query.
Drop the model table.
Truncate (delete all rows) for the model.
Expressive method for declaring an index on a model. Wraps the declaration of a ModelIndex instance.
Examples:
class Article(Model):
name = TextField()
timestamp = TimestampField()
status = IntegerField()
flags = BitField()
is_sticky = flags.flag(1)
is_favorite = flags.flag(2) # CREATE INDEX ... ON "article" ("name", "timestamp" DESC) idx = Article.index(Article.name, Article.timestamp.desc()) # Be sure to add the index to the model: Article.add_index(idx) # CREATE UNIQUE INDEX ... ON "article" ("timestamp" DESC, "flags" & 2) # WHERE ("status" = 1) idx = (Article
.index(Article.timestamp.desc(),
Article.flags.bin_and(2),
unique=True)
.where(Article.status == 1)) # Add index to model: Article.add_index(idx)
Add an index to the model's definition.
NOTE:
Examples:
class Article(Model):
name = TextField()
timestamp = TimestampField()
status = IntegerField()
flags = BitField()
is_sticky = flags.flag(1)
is_favorite = flags.flag(2) # CREATE INDEX ... ON "article" ("name", "timestamp") WHERE "status" = 1 idx = Article.index(Article.name, Article.timestamp).where(Article.status == 1) Article.add_index(idx) # CREATE UNIQUE INDEX ... ON "article" ("timestamp" DESC, "flags" & 2) ts_flags_idx = Article.index(
Article.timestamp.desc(),
Article.flags.bin_and(2),
unique=True) Article.add_index(ts_flags_idx) # You can also specify a list of fields and use the same keyword # arguments that the ModelIndex constructor accepts: Article.add_index(
Article.name,
Article.timestamp.desc(),
where=(Article.status == 1)) # Or even specify a SQL query directly: Article.add_index(SQL('CREATE INDEX ...'))
Generate a list of queries of dependent models. Yields a 2-tuple containing the query and corresponding foreign key field. Useful for searching dependencies of a model, i.e. things that would be orphaned in the event of a delete.
Convenience function for iterating over all instances of a model.
Example:
Setting.insert_many([
{'key': 'host', 'value': '192.168.1.2'},
{'key': 'port': 'value': '1337'},
{'key': 'user': 'value': 'nuggie'}]).execute() # Load settings from db into dict. settings = {setting.key: setting.value for setting in Setting}
Example:
n_accounts = len(Account) # Is equivalent to: n_accounts = Account.select().count()
Provide a separate reference to a model in a query.
Model-specific implementation of SELECT query.
Switch the join context - the source which subsequent calls to join() will be joined against. Used for specifying multiple joins against a single table.
If the ctx is not given, then the query's model will be used.
The following example selects from tweet and joins on both user and tweet-flag:
sq = Tweet.select().join(User).switch(Tweet).join(TweetFlag) # Equivalent (since Tweet is the query's model) sq = Tweet.select().join(User).switch().join(TweetFlag)
Return result rows as objects created using the given constructor. The default behavior is to create model instances.
NOTE:
Similarly, you can use dicts(), tuples() or namedtuples() to achieve even more performance.
Join with another table-like object.
Join type may be one of:
Example selecting tweets and joining on user in order to restrict to only those tweets made by "admin" users:
sq = Tweet.select().join(User).where(User.is_admin == True)
Example selecting users and joining on a particular foreign key field. See the example app for a real-life usage:
sq = User.select().join(Relationship, on=Relationship.to_user)
For an in-depth discussion of foreign-keys, joins and relationships between models, refer to relationships.
Use same parameter order as the non-model-specific join(). Bypasses the join context by requiring the join source to be specified.
Use Django-style filters to express a WHERE clause.
Execute the query, prefetching the given additional resources.
See also prefetch() standalone function.
Example:
# Fetch all Users and prefetch their associated tweets. query = User.select().prefetch(Tweet) for user in query:
print(user.username)
for tweet in user.tweets:
print(' *', tweet.content)
NOTE:
Eagerly fetch related objects, allowing efficient querying of multiple tables when a 1-to-many relationship exists.
For example, it is simple to query a many-to-1 relationship efficiently:
query = (Tweet
.select(Tweet, User)
.join(User)) for tweet in query:
# Looking up tweet.user.username does not require a query since
# the related user's columns were selected.
print(tweet.user.username, '->', tweet.content)
To efficiently do the inverse, query users and their tweets, you can use prefetch:
query = User.select() for user in prefetch(query, Tweet):
print(user.username)
for tweet in user.tweets: # Does not require additional query.
print(' ', tweet.content)
NOTE:
Peewee structures should all implement a __sql__ method, which will be called by the Context class during SQL generation. The __sql__ method accepts a single parameter, the Context instance, which allows for recursive descent and introspection of scope and state.
Convert the given node to a SQL AST and return a 2-tuple consisting of the SQL query and the parameters.
Bind the proxy to the given object. Afterwards all attribute lookups and method calls on the proxy will be sent to the given object.
Any callbacks that have been registered will be called.
Add a callback to be executed when the proxy is initialized.
See dynamic_db for details on usage.
Efficient implementation for breaking up large lists of data into smaller-sized chunks.
Usage:
it = range(10) # An iterable that yields 0...9. # Break the iterable into chunks of length 4. for chunk in chunked(it, 4):
print(', '.join(str(num) for num in chunk)) # PRINTS: # 0, 1, 2, 3 # 4, 5, 6, 7 # 8, 9
The default SqliteDatabase already includes many SQLite-specific features:
The playhouse.sqlite_ext includes even more SQLite features, including:
To get started with the features described in this document, you will want to use the SqliteExtDatabase class from the playhouse.sqlite_ext module. Furthermore, some features require the playhouse._sqlite_ext C extension -- these features will be noted in the documentation.
Instantiating a SqliteExtDatabase:
from playhouse.sqlite_ext import SqliteExtDatabase db = SqliteExtDatabase('my_app.db', pragmas=(
('cache_size', -1024 * 64), # 64MB page-cache.
('journal_mode', 'wal'), # Use WAL-mode (you should always use this!).
('foreign_keys', 1))) # Enforce foreign-key constraints.
Extends SqliteDatabase and inherits methods for declaring user-defined functions, pragmas, etc.
Extends SqliteExtDatabase and requires that the playhouse._sqlite_ext extension module be available.
However, if the callback raises a ValueError, the transaction will be aborted and rolled-back.
Example:
db = CSqliteExtDatabase(':memory:') @db.on_commit def on_commit():
logger.info('COMMITing changes')
Example:
@db.on_rollback def on_rollback():
logger.info('Rolling back changes')
The callback's return value is ignored.
Example:
db = CSqliteExtDatabase(':memory:') @db.on_update def on_update(query_type, db, table, rowid):
# e.g. INSERT row 3 into table users.
logger.info('%s row %s into table %s', query_type, rowid, table)
Example:
>>> db = CSqliteExtDatabase(':memory:') >>> db.autocommit True >>> with db.atomic(): ... print(db.autocommit) ... False >>> db.autocommit True
Example:
master = CSqliteExtDatabase('master.db') replica = CSqliteExtDatabase('replica.db') # Backup the contents of master to replica. master.backup(replica)
Backup the current database to a file. The backed-up data is not a database dump, but an actual SQLite database file.
Example:
db = CSqliteExtDatabase('app.db') def nightly_backup():
filename = 'backup-%s.db' % (datetime.date.today())
db.backup_to_file(filename)
See Blob and ZeroBlob for more information.
Example:
class Image(Model):
filename = TextField()
data = BlobField() buf_size = 1024 * 1024 * 8 # Allocate 8MB for storing file. rowid = Image.insert({Image.filename: 'thefile.jpg',
Image.data: ZeroBlob(buf_size)}).execute() # Open the blob, returning a file-like object. blob = db.blob_open('image', 'data', rowid) # Write some data to the blob. blob.write(image_data) img_size = blob.tell() # Read the data back out of the blob. blob.seek(0) image_data = blob.read(img_size)
Example:
class Note(Model):
rowid = RowIDField() # Will be primary key.
content = TextField()
timestamp = TimestampField()
ATTENTION:
class NoteIndex(FTSModel):
docid = DocIDField() # "docid" is used as an alias for "rowid".
content = SearchField()
class Meta:
database = db
SQLite 3.9.0 added JSON support in the form of an extension library. The SQLite json1 extension provides a number of helper functions for working with JSON data. These APIs are exposed as methods of a special field-type, JSONField.
To access or modify specific object keys or array indexes in a JSON structure, you can treat the JSONField as if it were a dictionary/list.
NOTE:
# Do not escape unicode code-points. my_json_dumps = functools.partial(json.dumps, ensure_ascii=False) class SomeModel(Model):
# Specify our custom serialization function.
json_data = JSONField(json_dumps=my_json_dumps)
Let's look at some examples of using the SQLite json1 extension with Peewee. Here we'll prepare a database and a simple model for testing the json1 extension:
>>> from playhouse.sqlite_ext import * >>> db = SqliteExtDatabase(':memory:') >>> class KV(Model): ... key = TextField() ... value = JSONField() ... class Meta: ... database = db ... >>> KV.create_table()
Storing data works as you might expect. There's no need to serialize dictionaries or lists as JSON, as this is done automatically by Peewee:
>>> KV.create(key='a', value={'k1': 'v1'}) <KV: 1> >>> KV.get(KV.key == 'a').value {'k1': 'v1'}
We can access specific parts of the JSON data using dictionary lookups:
>>> KV.get(KV.value['k1'] == 'v1').key 'a'
It's possible to update a JSON value in-place using the update() method. Note that "k1=v1" is preserved:
>>> KV.update(value=KV.value.update({'k2': 'v2', 'k3': 'v3'})).execute() 1 >>> KV.get(KV.key == 'a').value {'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
We can also update existing data atomically, or remove keys by setting their value to None. In the following example, we'll update the value of "k1" and remove "k3" ("k2" will not be modified):
>>> KV.update(value=KV.value.update({'k1': 'v1-x', 'k3': None})).execute() 1 >>> KV.get(KV.key == 'a').value {'k1': 'v1-x', 'k2': 'v2'}
We can also set individual parts of the JSON data using the set() method:
>>> KV.update(value=KV.value['k1'].set('v1')).execute() 1 >>> KV.get(KV.key == 'a').value {'k1': 'v1', 'k2': 'v2'}
The set() method can also be used with objects, in addition to scalar values:
>>> KV.update(value=KV.value['k2'].set({'x2': 'y2'})).execute() 1 >>> KV.get(KV.key == 'a').value {'k1': 'v1', 'k2': {'x2': 'y2'}}
Individual parts of the JSON data can be removed atomically as well, using remove():
>>> KV.update(value=KV.value['k2'].remove()).execute() 1 >>> KV.get(KV.key == 'a').value {'k1': 'v1'}
We can also get the type of value stored at a specific location in the JSON data using the json_type() method:
>>> KV.select(KV.value.json_type(), KV.value['k1'].json_type()).tuples()[:] [('object', 'text')]
Let's add a nested value and then see how to iterate through it's contents recursively using the tree() method:
>>> KV.create(key='b', value={'x1': {'y1': 'z1', 'y2': 'z2'}, 'x2': [1, 2]}) <KV: 2> >>> tree = KV.value.tree().alias('tree') >>> query = KV.select(KV.key, tree.c.fullkey, tree.c.value).from_(KV, tree) >>> query.tuples()[:] [('a', '$', {'k1': 'v1'}),
('a', '$.k1', 'v1'),
('b', '$', {'x1': {'y1': 'z1', 'y2': 'z2'}, 'x2': [1, 2]}),
('b', '$.x2', [1, 2]),
('b', '$.x2[0]', 1),
('b', '$.x2[1]', 2),
('b', '$.x1', {'y1': 'z1', 'y2': 'z2'}),
('b', '$.x1.y1', 'z1'),
('b', '$.x1.y2', 'z2')]
The tree() and children() methods are powerful. For more information on how to utilize them, see the json1 extension documentation.
Also note, that JSONField lookups can be chained:
>>> query = KV.select().where(KV.value['x1']['y1'] == 'z1') >>> for obj in query: ... print(obj.key, obj.value) ... 'b', {'x1': {'y1': 'z1', 'y2': 'z2'}, 'x2': [1, 2]}
For more information, refer to the sqlite json1 documentation.
Access a specific key or array index in the JSON data. Returns a JSONPath object, which exposes convenient methods for reading or modifying a particular part of a JSON object.
Example:
# If metadata contains {"tags": ["list", "of", "tags"]}, we can # extract the first tag in this way: Post.select(Post, Post.metadata['tags'][0].alias('first_tag'))
For more examples see the JSONPath API documentation.
Set the value stored in a JSONField.
Uses the json_set() function from the json1 extension.
Merge new data into the JSON value using the RFC-7396 MergePatch algorithm to apply a patch (data parameter) against the column data. MergePatch can add, modify, or delete elements of a JSON object, which means update() is a generalized replacement for both set() and remove(). MergePatch treats JSON array objects as atomic, so update() cannot append to an array, nor modify individual elements of an array.
For more information as well as examples, see the SQLite json_patch() function documentation.
Uses the json_remove function from the json1 extension.
The type returned will be one of:
Uses the json_type function from the json1 extension.
Uses the json_array_length function from the json1 extension.
The rows returned by calls to children() have the following attributes:
Internally this method uses the json_each (documentation link) function from the json1 extension.
Example usage (compare to tree() method):
class KeyData(Model):
key = TextField()
data = JSONField() KeyData.create(key='a', data={'k1': 'v1', 'x1': {'y1': 'z1'}}) KeyData.create(key='b', data={'x1': {'y1': 'z1', 'y2': 'z2'}}) # We will query the KeyData model for the key and all the # top-level keys and values in it's data field. kd = KeyData.data.children().alias('children') query = (KeyData
.select(kd.c.key, kd.c.value, kd.c.fullkey)
.from_(KeyData, kd)
.order_by(kd.c.key)
.tuples()) print(query[:]) # PRINTS: [('a', 'k1', 'v1', '$.k1'),
('a', 'x1', '{"y1":"z1"}', '$.x1'),
('b', 'x1', '{"y1":"z1","y2":"z2"}', '$.x1')]
The rows returned by calls to tree() have the same attributes as rows returned by calls to children():
Internally this method uses the json_tree (documentation link) function from the json1 extension.
Example usage:
class KeyData(Model):
key = TextField()
data = JSONField() KeyData.create(key='a', data={'k1': 'v1', 'x1': {'y1': 'z1'}}) KeyData.create(key='b', data={'x1': {'y1': 'z1', 'y2': 'z2'}}) # We will query the KeyData model for the key and all the # keys and values in it's data field, recursively. kd = KeyData.data.tree().alias('tree') query = (KeyData
.select(kd.c.key, kd.c.value, kd.c.fullkey)
.from_(KeyData, kd)
.order_by(kd.c.key)
.tuples()) print(query[:]) # PRINTS: [('a', None, '{"k1":"v1","x1":{"y1":"z1"}}', '$'),
('b', None, '{"x1":{"y1":"z1","y2":"z2"}}', '$'),
('a', 'k1', 'v1', '$.k1'),
('a', 'x1', '{"y1":"z1"}', '$.x1'),
('b', 'x1', '{"y1":"z1","y2":"z2"}', '$.x1'),
('a', 'y1', 'z1', '$.x1.y1'),
('b', 'y1', 'z1', '$.x1.y1'),
('b', 'y2', 'z2', '$.x1.y2')]
A convenient, Pythonic way of representing JSON paths for use with JSONField.
The JSONPath object implements __getitem__, accumulating path components, which it can turn into the corresponding json-path expression.
Access a sub-key or array index in the JSON data. Returns a JSONPath object, which exposes convenient methods for reading or modifying a particular part of a JSON object.
Example:
# If metadata contains {"tags": ["list", "of", "tags"]}, we can # extract the first tag in this way: first_tag = Post.metadata['tags'][0] query = (Post
.select(Post, first_tag.alias('first_tag'))
.order_by(first_tag))
Set the value at the given location in the JSON data.
Uses the json_set() function from the json1 extension.
Merge new data into the JSON value using the RFC-7396 MergePatch algorithm to apply a patch (data parameter) against the column data. MergePatch can add, modify, or delete elements of a JSON object, which means update() is a generalized replacement for both set() and remove(). MergePatch treats JSON array objects as atomic, so update() cannot append to an array, nor modify individual elements of an array.
For more information as well as examples, see the SQLite json_patch() function documentation.
Uses the json_type function from the json1 extension.
The type returned will be one of:
Uses the json_type function from the json1 extension.
Uses the json_array_length function from the json1 extension.
Example model for document search index (timestamp is stored in the table but it's data is not searchable):
class DocumentIndex(FTSModel):
title = SearchField()
content = SearchField()
tags = SearchField()
timestamp = SearchField(unindexed=True)
Sqlite's full-text search supports searching either the full table, including all indexed columns, or searching individual columns. The match() method can be used to restrict search to a single column:
class SearchIndex(FTSModel):
title = SearchField()
body = SearchField() # Search *only* the title field and return results ordered by # relevance, using bm25. query = (SearchIndex
.select(SearchIndex, SearchIndex.bm25().alias('score'))
.where(SearchIndex.title.match('python'))
.order_by(SearchIndex.bm25()))
To instead search all indexed columns, use the FTSModel.match() method:
# Searches *both* the title and body and return results ordered by # relevance, using bm25. query = (SearchIndex
.select(SearchIndex, SearchIndex.bm25().alias('score'))
.where(SearchIndex.match('python'))
.order_by(SearchIndex.bm25()))
Metadata options:
These all are combined in the following way:
CREATE VIRTUAL TABLE <table_name> USING <extension_module> ([prefix_arguments, ...] fields, ... [arguments, ...], [options...])
FTSModel subclasses should be defined normally, however there are a couple caveats:
Given these constraints, it is strongly recommended that all fields declared on an FTSModel subclass be instances of SearchField (though an exception is made for explicitly declaring a RowIDField). Using SearchField will help prevent you accidentally creating invalid column constraints. If you wish to store metadata in the index but would not like it to be included in the full-text index, then specify unindexed=True when instantiating the SearchField.
The only exception to the above is for the rowid primary key, which can be declared using RowIDField. Lookups on the rowid are very efficient. If you are using FTS4 you can also use DocIDField, which is an alias for the rowid (though there is no benefit to doing so).
Because of the lack of secondary indexes, it usually makes sense to use the rowid primary key as a pointer to a row in a regular table. For example:
class Document(Model):
# Canonical source of data, stored in a regular table.
author = ForeignKeyField(User, backref='documents')
title = TextField(null=False, unique=True)
content = TextField(null=False)
timestamp = DateTimeField()
class Meta:
database = db class DocumentIndex(FTSModel):
# Full-text search index.
rowid = RowIDField()
title = SearchField()
content = SearchField()
class Meta:
database = db
# Use the porter stemming algorithm to tokenize content.
options = {'tokenize': 'porter'}
To store a document in the document index, we will INSERT a row into the DocumentIndex table, manually setting the rowid so that it matches the primary-key of the corresponding Document:
def store_document(document):
DocumentIndex.insert({
DocumentIndex.rowid: document.id,
DocumentIndex.title: document.title,
DocumentIndex.content: document.content}).execute()
To perform a search and return ranked results, we can query the Document table and join on the DocumentIndex. This join will be efficient because lookups on an FTSModel's rowid field are fast:
def search(phrase):
# Query the search index and join the corresponding Document
# object on each search result.
return (Document
.select()
.join(
DocumentIndex,
on=(Document.id == DocumentIndex.rowid))
.where(DocumentIndex.match(phrase))
.order_by(DocumentIndex.bm25()))
WARNING:
If the primary source of the content you are indexing exists in a separate table, you can save some disk space by instructing SQLite to not store an additional copy of the search index content. SQLite will still create the metadata and data-structures needed to perform searches on the content, but the content itself will not be stored in the search index.
To accomplish this, you can specify a table or column using the content option. The FTS4 documentation has more information.
Here is a short example illustrating how to implement this with peewee:
class Blog(Model):
title = TextField()
pub_date = DateTimeField(default=datetime.datetime.now)
content = TextField() # We want to search this.
class Meta:
database = db class BlogIndex(FTSModel):
content = SearchField()
class Meta:
database = db
options = {'content': Blog.content} # <-- specify data source. db.create_tables([Blog, BlogIndex]) # Now, we can manage content in the BlogIndex. To populate the # search index: BlogIndex.rebuild() # Optimize the index. BlogIndex.optimize()
The content option accepts either a single Field or a Model and can reduce the amount of storage used by the database file. However, content will need to be manually moved to/from the associated FTSModel.
Generate a SQL expression representing a search for the given term or expression in the table. SQLite uses the MATCH operator to indicate a full-text search.
Example:
# Search index for "search phrase" and return results ranked # by relevancy using the BM25 algorithm. query = (DocumentIndex
.select()
.where(DocumentIndex.match('search phrase'))
.order_by(DocumentIndex.bm25())) for result in query:
print('Result: %s' % result.title)
Shorthand way of searching for a term and sorting results by the quality of the match.
NOTE:
# Simple search. docs = DocumentIndex.search('search term') for result in docs:
print(result.title) # More complete example. docs = DocumentIndex.search(
'search term',
weights={'title': 2.0, 'content': 1.0},
with_score=True,
score_alias='search_score') for result in docs:
print(result.title, result.search_score)
Shorthand way of searching for a term and sorting results by the quality of the match using the BM25 algorithm.
ATTENTION:
Generate an expression that will calculate and return the quality of the search match. This rank can be used to sort the search results. A higher rank score indicates a better match.
The rank function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.
NOTE:
query = (DocumentIndex
.select(
DocumentIndex,
DocumentIndex.rank().alias('score'))
.where(DocumentIndex.match('search phrase'))
.order_by(DocumentIndex.rank())) for search_result in query:
print search_result.title, search_result.score
Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results, with higher scores corresponding to better matches.
Like rank(), bm25 function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.
ATTENTION:
query = (DocumentIndex
.select(
DocumentIndex,
DocumentIndex.bm25().alias('score'))
.where(DocumentIndex.match('search phrase'))
.order_by(DocumentIndex.bm25())) for search_result in query:
print(search_result.title, search_result.score)
NOTE:
query = DocumentIndex.search_bm25('search phrase', with_score=True) for search_result in query:
print(search_result.title, search_result.score)
FTS5Model subclasses should be defined normally, however there are a couple caveats:
The FTS5 extension comes with a built-in implementation of the BM25 ranking function. Therefore, the search and search_bm25 methods have been overridden to use the builtin ranking functions rather than user-defined functions.
Shorthand way of searching for a term and sorting results by the quality of the match. The FTS5 extension provides a built-in implementation of the BM25 algorithm, which is used to rank the results by relevance.
Higher scores correspond to better matches.
# Simple search. docs = DocumentIndex.search('search term') for result in docs:
print(result.title) # More complete example. docs = DocumentIndex.search(
'search term',
weights={'title': 2.0, 'content': 1.0},
with_score=True,
score_alias='search_score') for result in docs:
print(result.title, result.search_score)
Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results, with higher scores corresponding to better matches.
The rank() function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.
query = (DocumentIndex
.select(
DocumentIndex,
DocumentIndex.rank().alias('score'))
.where(DocumentIndex.match('search phrase'))
.order_by(DocumentIndex.rank())) for search_result in query:
print(search_result.title, search_result.score)
NOTE:
query = DocumentIndex.search('search phrase', with_score=True) for search_result in query:
print(search_result.title, search_result.score)
Generate a model class suitable for accessing the vocab table corresponding to FTS5 search index.
Simple example:
from playhouse.sqlite_ext import TableFunction class Series(TableFunction):
# Name of columns in each row of generated data.
columns = ['value']
# Name of parameters the function may be called with.
params = ['start', 'stop', 'step']
def initialize(self, start=0, stop=None, step=1):
"""
Table-functions declare an initialize() method, which is
called with whatever arguments the user has called the
function with.
"""
self.start = self.current = start
self.stop = stop or float('Inf')
self.step = step
def iterate(self, idx):
"""
Iterate is called repeatedly by the SQLite database engine
until the required number of rows has been read **or** the
function raises a `StopIteration` signalling no more rows
are available.
"""
if self.current > self.stop:
raise StopIteration
ret, self.current = self.current, self.current + self.step
return (ret,) # Register the table-function with our database, which ensures it # is declared whenever a connection is opened. db.table_function('series')(Series) # Usage: cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2)) for value, in cursor:
print(value)
NOTE:
TableFunction implementations must provide two attributes and implement two methods, described below.
The initialize method is called to initialize the table function with the parameters the user specified when calling the function.
This function is called repeatedly and returns successive rows of data. The function may terminate before all rows are consumed (especially if the user specified a LIMIT on the results). Alternatively, the function can signal that no more data is available by raising a StopIteration exception.
Register the table function with a DB-API 2.0 sqlite3.Connection object. Table-valued functions must be registered before they can be used in a query.
Example:
class MyTableFunction(TableFunction):
name = 'my_func'
# ... other attributes and methods ... db = SqliteDatabase(':memory:') db.connect() MyTableFunction.register(db.connection())
To ensure the TableFunction is registered every time a connection is opened, use the table_function() decorator.
Factory function for creating a model class suitable for working with a transitive closure table. Closure tables are VirtualModel subclasses that work with the transitive closure SQLite extension. These special tables are designed to make it easy to efficiently query hierarchical data. The SQLite extension manages an AVL tree behind-the-scenes, transparently updating the tree when your table changes and making it easy to perform common queries on hierarchical data.
To use the closure table extension in your project, you need:
$ git clone https://gist.github.com/coleifer/7f3593c5c2a645913b92 closure $ cd closure/
$ gcc -g -fPIC -shared closure.c -o closure.so
class Category(Model):
name = CharField()
metadata = TextField()
parent = ForeignKeyField('self', index=True, null=True) # Required. # Generate a model for the closure virtual table. CategoryClosure = ClosureTable(Category)
The self-referentiality can also be achieved via an intermediate table (for a many-to-many relation).
class User(Model):
name = CharField() class UserRelations(Model):
user = ForeignKeyField(User)
knows = ForeignKeyField(User, backref='_known_by')
class Meta:
primary_key = CompositeKey('user', 'knows') # Alternatively, a unique index on both columns. # Generate a model for the closure virtual table, specifying the UserRelations as the referencing table UserClosure = ClosureTable(
User,
referencing_class=UserRelations,
foreign_key=UserRelations.knows,
referencing_key=UserRelations.user)
db = SqliteExtDatabase('my_database.db') db.load_extension('/path/to/closure')
WARNING:
Example:
class Category(Model):
name = CharField()
metadata = TextField()
parent = ForeignKeyField('self', index=True, null=True) # Required. # Generate a model for the closure virtual table. CategoryClosure = ClosureTable(Category)
# Create the tables if they do not exist.
db.create_tables([Category, CategoryClosure], True)
It is now possible to perform interesting queries using the data from the closure table:
# Get all ancestors for a particular node. laptops = Category.get(Category.name == 'Laptops') for parent in Closure.ancestors(laptops):
print parent.name # Computer Hardware # Computers # Electronics # All products # Get all descendants for a particular node. hardware = Category.get(Category.name == 'Computer Hardware') for node in Closure.descendants(hardware):
print node.name # Laptops # Desktops # Hard-drives # Monitors # LCD Monitors # LED Monitors
API of the VirtualModel returned by ClosureTable().
node = Category.get(Category.name == 'Electronics') # Direct child categories. children = CategoryClosure.descendants(node, depth=1) # Grand-child categories. children = CategoryClosure.descendants(node, depth=2) # Descendants at all depths. all_descendants = CategoryClosure.descendants(node)
node = Category.get(Category.name == 'Laptops') # All ancestors. all_ancestors = CategoryClosure.ancestors(node) # Grand-parent category. grandparent = CategoryClosure.ancestores(node, depth=2)
NOTE:
NOTE:
LSM tables define one primary key column and an arbitrary number of additional value columns (which are serialized and stored in a single value field in the storage engine). The primary key must be all of the same type and use one of the following field types:
Since the LSM storage engine is a key/value store, primary keys (including integers) must be specified by the application.
ATTENTION:
Example model declaration:
db = SqliteExtDatabase('my_app.db') db.load_extension('lsm.so') # Load shared library. class EventLog(LSMTable):
timestamp = IntegerField(primary_key=True)
action = TextField()
sender = TextField()
target = TextField()
class Meta:
database = db
filename = 'eventlog.ldb' # LSM data is stored in separate db. # Declare virtual table. EventLog.create_table()
Example queries:
# Use dictionary operators to get, set and delete rows from the LSM # table. Slices may be passed to represent a range of key values. def get_timestamp():
# Return time as integer expressing time in microseconds.
return int(time.time() * 1000000) # Create a new row, at current timestamp. ts = get_timestamp() EventLog[ts] = ('pageview', 'search', '/blog/some-post/') # Retrieve row from event log. log = EventLog[ts] print(log.action, log.sender, log.target) # Prints ("pageview", "search", "/blog/some-post/") # Delete the row. del EventLog[ts] # We can also use the "create()" method. EventLog.create(
timestamp=get_timestamp(),
action='signup',
sender='newsletter',
target='sqlite-news')
Simple key/value model declaration:
class KV(LSMTable):
key = TextField(primary_key=True)
value = TextField()
class Meta:
database = db
filename = 'kv.ldb' db.create_tables([KV])
For tables consisting of a single value field, Peewee will return the value directly when getting a single item. You can also request slices of rows, in which case Peewee returns a corresponding Select query, which can be iterated over. Below are some examples:
>>> KV['k0'] = 'v0' >>> print(KV['k0']) 'v0' >>> data = [{'key': 'k%d' % i, 'value': 'v%d' % i} for i in range(20)] >>> KV.insert_many(data).execute() >>> KV.select().count() 20 >>> KV['k8'] 'v8' >>> list(KV['k4.1':'k7.x'] [Row(key='k5', value='v5'),
Row(key='k6', value='v6'),
Row(key='k7', value='v7')] >>> list(KV['k6xxx':]) [Row(key='k7', value='v7'),
Row(key='k8', value='v8'),
Row(key='k9', value='v9')]
You can also index the LSMTable using expressions:
>>> list(KV[KV.key > 'k6']) [Row(key='k7', value='v7'),
Row(key='k8', value='v8'),
Row(key='k9', value='v9')] >>> list(KV[(KV.key > 'k6') & (KV.value != 'v8')]) [Row(key='k7', value='v7'),
Row(key='k9', value='v9')]
You can delete single rows using del or multiple rows using slices or expressions:
>>> del KV['k1'] >>> del KV['k3x':'k8'] >>> del KV[KV.key.between('k10', 'k18')] >>> list(KV[:]) [Row(key='k0', value='v0'),
Row(key='k19', value='v19'),
Row(key='k2', value='v2'),
Row(key='k3', value='v3'),
Row(key='k9', value='v9')]
Attempting to get a single non-existant key will result in a KeyError, but slices will not raise an exception:
>>> KV['k1'] ... KeyError: 'k1' >>> list(KV['k1':'k1']) []
ZeroBlob is used solely to reserve space for storing a BLOB that supports incremental I/O. To use the SQLite BLOB-store it is necessary to first insert a ZeroBlob of the desired size into the row you wish to use with incremental I/O.
For example, see Blob.
Open a blob, stored in the given table/column/row, for incremental I/O. To allocate storage for new data, you can use the ZeroBlob, which is very efficient.
class RawData(Model):
data = BlobField() # Allocate 100MB of space for writing a large file incrementally: query = RawData.insert({'data': ZeroBlob(1024 * 1024 * 100)}) rowid = query.execute() # Now we can open the row for incremental I/O: blob = Blob(db, 'rawdata', 'data', rowid) # Read from the file and write to the blob in chunks of 4096 bytes. while True:
data = file_handle.read(4096)
if not data:
break
blob.write(data) bytes_written = blob.tell() blob.close()
Read up to n bytes from the current position in the blob file. If n is not specified, the entire blob will be read.
Values for whence:
Writes the given data, starting at the current position in the file.
If a blob has already been opened for a given table/column, you can use the reopen() method to re-use the same Blob object for accessing multiple rows in the table.
The SqliteExtDatabase accepts an initialization option to register support for a simple bloom filter. The bloom filter, once initialized, can then be used for efficient membership queries on large set of data.
Here's an example:
db = CSqliteExtDatabase(':memory:', bloomfilter=True) # Create and define a table to store some data. db.execute_sql('CREATE TABLE "register" ("data" TEXT)') Register = Table('register', ('data',)).bind(db) # Populate the database with a bunch of text. with db.atomic():
for i in 'abcdefghijklmnopqrstuvwxyz':
keys = [i * j for j in range(1, 10)] # a, aa, aaa, ... aaaaaaaaa
Register.insert([{'data': key} for key in keys]).execute() # Collect data into a 16KB bloomfilter. query = Register.select(fn.bloomfilter(Register.data, 16 * 1024).alias('buf')) row = query.get() buf = row['buf'] # Use bloomfilter buf to test whether other keys are members. test_keys = (
('aaaa', True),
('abc', False),
('zzzzzzz', True),
('zyxwvut', False)) for key, is_present in test_keys:
query = Register.select(fn.bloomfilter_contains(key, buf).alias('is_member'))
answer = query.get()['is_member']
assert answer == is_present
The SqliteExtDatabase can also register other useful functions:
Examples:
def create_new_user(username, password):
# DO NOT DO THIS IN REAL LIFE. PLEASE.
query = User.insert({'username': username, 'password': fn.sha1(password)})
new_user_id = query.execute()
You can use the murmurhash function to hash bytes to an integer for compact storage:
>>> db = SqliteExtDatabase(':memory:', hash_functions=True) >>> db.execute_sql('SELECT murmurhash(?)', ('abcdefg',)).fetchone() (4188131059,)
Peewee comes with numerous extension modules which are collected under the playhouse namespace. Despite the silly name, there are some very useful extensions, particularly those that expose vendor-specific database features like the sqlite_ext and Postgresql Extensions extensions.
Below you will find a loosely organized listing of the various modules that make up the playhouse.
Database drivers / vendor-specific database functionality
High-level features
Database management and framework integration
The Sqlite extensions have been moved to their own page.
The playhouse.sqliteq module provides a subclass of SqliteExtDatabase, that will serialize concurrent writes to a SQLite database. SqliteQueueDatabase can be used as a drop-in replacement for the regular SqliteDatabase if you want simple read and write access to a SQLite database from multiple threads.
SQLite only allows one connection to write to the database at any given time. As a result, if you have a multi-threaded application (like a web-server, for example) that needs to write to the database, you may see occasional errors when one or more of the threads attempting to write cannot acquire the lock.
SqliteQueueDatabase is designed to simplify things by sending all write queries through a single, long-lived connection. The benefit is that you get the appearance of multiple threads writing to the database without conflicts or timeouts. The downside, however, is that you cannot issue write transactions that encompass multiple queries -- all writes run in autocommit mode, essentially.
NOTE:
Because all queries are serialized and executed by a single worker thread, it is possible for transactional SQL from separate threads to be executed out-of-order. In the example below, the transaction started by thread "B" is rolled back by thread "A" (with bad consequences!):
Since there is a potential for queries from separate transactions to be interleaved, the transaction() and atomic() methods are disabled on SqliteQueueDatabase.
For cases when you wish to temporarily write to the database from a different thread, you can use the pause() and unpause() methods. These methods block the caller until the writer thread is finished with its current workload. The writer then disconnects and the caller takes over until unpause is called.
The stop(), start(), and is_stopped() methods can also be used to control the writer thread.
NOTE:
Creating a database instance does not require any special handling. The SqliteQueueDatabase accepts some special parameters which you should be aware of, though. If you are using gevent, you must specify use_gevent=True when instantiating your database -- this way Peewee will know to use the appropriate objects for handling queueing, thread creation, and locking.
from playhouse.sqliteq import SqliteQueueDatabase db = SqliteQueueDatabase(
'my_app.db',
use_gevent=False, # Use the standard library "threading" module.
autostart=False, # The worker thread now must be started manually.
queue_max_size=64, # Max. # of pending writes that can accumulate.
results_timeout=5.0) # Max. time to wait for query to be executed.
If autostart=False, as in the above example, you will need to call start() to bring up the worker threads that will do the actual write query execution.
@app.before_first_request def _start_worker_threads():
db.start()
If you plan on performing SELECT queries or generally wanting to access the database, you will need to call connect() and close() as you would with any other database instance.
When your application is ready to terminate, use the stop() method to shut down the worker thread. If there was a backlog of work, then this method will block until all pending work is finished (though no new work is allowed).
import atexit @atexit.register def _stop_worker_threads():
db.stop()
Lastly, the is_stopped() method can be used to determine whether the database writer is up and running.
The sqlite_udf playhouse module contains a number of user-defined functions, aggregates, and table-valued functions, which you may find useful. The functions are grouped in collections and you can register these user-defined extensions individually, by collection, or register everything.
Scalar functions are functions which take a number of parameters and return a single value. For example, converting a string to upper-case, or calculating the MD5 hex digest.
Aggregate functions are like scalar functions that operate on multiple rows of data, producing a single result. For example, calculating the sum of a list of integers, or finding the smallest value in a particular column.
Table-valued functions are simply functions that can return multiple rows of data. For example, a regular-expression search function that returns all the matches in a given string, or a function that accepts two dates and generates all the intervening days.
NOTE:
Registering user-defined functions:
db = SqliteDatabase('my_app.db') # Register *all* functions. register_all(db) # Alternatively, you can register individual groups. This will just # register the DATE and MATH groups of functions. register_groups(db, 'DATE', 'MATH') # If you only wish to register, say, the aggregate functions for a # particular group or groups, you can: register_aggregate_groups(db, 'DATE') # If you only wish to register a single function, then you can: from playhouse.sqlite_udf import gzip, gunzip db.register_function(gzip, 'gzip') db.register_function(gunzip, 'gunzip')
Using a library function ("hostname"):
# Assume we have a model, Link, that contains lots of arbitrary URLs. # We want to discover the most common hosts that have been linked. query = (Link
.select(fn.hostname(Link.url).alias('host'), fn.COUNT(Link.id))
.group_by(fn.hostname(Link.url))
.order_by(fn.COUNT(Link.id).desc())
.tuples()) # Print the hostname along with number of links associated with it. for host, count in query:
print('%s: %s' % (host, count))
Scalar functions are indicated by (f), aggregate functions by (a), and table-valued functions by (t).
CONTROL_FLOW
DATE
The time is not adjusted in any way, the timezone is simply removed.
Example, 86471 -> "1 day, 1 minute, 11 seconds"
Aggregate function that computes the minimum difference between any two datetimes.
Aggregate function that computes the average difference between consecutive values in the list.
Aggregate function that computes the duration from the smallest to the largest value in the list, returned in seconds.
Table-value function that returns rows consisting of the date/+time values encountered iterating from start to stop, step_seconds at a time.
Additionally, if start does not have a time component and step_seconds is greater-than-or-equal-to one day (86400 seconds), the values returned will be dates. Conversely, if start does not have a date component, values will be returned as times. Otherwise values are returned as datetimes.
Example:
SELECT * FROM date_series('2017-01-28', '2017-02-02'); value ----- 2017-01-28 2017-01-29 2017-01-30 2017-01-31 2017-02-01 2017-02-02
FILE
HELPER
Toggle a key between True/False state. Example:
>>> toggle('my-key') True >>> toggle('my-key') False >>> toggle('my-key') True
Store/retrieve a setting in memory and persist during lifetime of application. To get the current value, only specify the key. To set a new value, call with key and new value.
MATH
Return a random integer between [start, end).
Aggregate function which calculates mode of values.
Aggregate function which calculates the minimal distance between two numbers in the sequence.
Aggregate function which calculates the average distance between two consecutive numbers in the sequence.
Aggregate function which returns range of values observed.
Aggregate function which calculates the middle value in a sequence.
NOTE:
STRING
NOTE:
NOTE:
NOTE:
Table-value function that searches a string for substrings that match the provided regex. Returns rows for each match found.
Example:
SELECT * FROM regex_search('\w+', 'extract words, ignore! symbols'); value ----- extract words ignore symbols
The apsw_ext module contains a database class suitable for use with the apsw sqlite driver.
APSW Project page: https://github.com/rogerbinns/apsw
APSW is a really neat library that provides a thin wrapper on top of SQLite's C interface, making it possible to use all of SQLite's advanced features.
Here are just a few reasons to use APSW, taken from the documentation:
For more information on the differences between apsw and pysqlite, check the apsw docs.
from apsw_ext import * db = APSWDatabase(':memory:') class BaseModel(Model):
class Meta:
database = db class SomeModel(BaseModel):
col1 = CharField()
col2 = DateTimeField()
APSWDatabase extends the SqliteExtDatabase and inherits its advanced features.
NOTE:
For example, instead of using peewee.DateTimeField, be sure you are importing and using playhouse.apsw_ext.DateTimeField.
Also note that this code relies on pysqlcipher and sqlcipher, and the code there might have vulnerabilities as well, but since these are widely used crypto modules, we can expect "short zero days" there.
NOTE:
For example to specify the number of PBKDF2 iterations for the key derivation (64K in SQLCipher 3.x, 256K in SQLCipher 4.x by default):
# Use 1,000,000 iterations. db = SqlCipherDatabase('my_app.db', pragmas={'kdf_iter': 1000000})
To use a cipher page-size of 16KB and a cache-size of 10,000 pages:
db = SqlCipherDatabase('my_app.db', passphrase='secret!!!', pragmas={
'cipher_page_size': 1024 * 16,
'cache_size': 10000}) # 10,000 16KB pages, or 160MB.
Example of prompting the user for a passphrase:
db = SqlCipherDatabase(None) class BaseModel(Model):
"""Parent for all app's models"""
class Meta:
# We won't have a valid db until user enters passhrase.
database = db # Derive our model subclasses class Person(BaseModel):
name = TextField(primary_key=True) right_passphrase = False while not right_passphrase:
db.init(
'testsqlcipher.db',
passphrase=get_passphrase_from_user())
try: # Actually execute a query against the db to test passphrase.
db.get_tables()
except DatabaseError as exc:
# This error indicates the password was wrong.
if exc.args[0] == 'file is encrypted or is not a database':
tell_user_the_passphrase_was_wrong()
db.init(None) # Reset the db.
else:
raise exc
else:
# The password was correct.
right_passphrase = True
See also: a slightly more elaborate example.
The postgresql extensions module provides a number of "postgres-only" functions, currently:
In the future I would like to add support for more of postgresql's features. If there is a particular feature you would like to see added, please open a Github issue.
WARNING:
The code below will assume you are using the following database and base model:
from playhouse.postgres_ext import * ext_db = PostgresqlExtDatabase('peewee_test', user='postgres') class BaseExtModel(Model):
class Meta:
database = ext_db
peewee has basic support for Postgres' native JSON data type, in the form of JSONField. As of version 2.4.7, peewee also supports the Postgres 9.4 binary json jsonb type, via BinaryJSONField.
WARNING:
To use BinaryJSONField, which has many performance and querying advantages, you must have Postgres 9.4 or later.
NOTE:
Here is an example of how you might declare a model with a JSON field:
import json import urllib2 from playhouse.postgres_ext import * db = PostgresqlExtDatabase('my_database') class APIResponse(Model):
url = CharField()
response = JSONField()
class Meta:
database = db
@classmethod
def request(cls, url):
fh = urllib2.urlopen(url)
return cls.create(url=url, response=json.loads(fh.read())) APIResponse.create_table() # Store a JSON response. offense = APIResponse.request('http://crime-api.com/api/offense/') booking = APIResponse.request('http://crime-api.com/api/booking/') # Query a JSON data structure using a nested key lookup: offense_responses = APIResponse.select().where(
APIResponse.response['meta']['model'] == 'offense') # Retrieve a sub-key for each APIResponse. By calling .as_json(), the # data at the sub-key will be returned as Python objects (dicts, lists, # etc) instead of serialized JSON. q = (APIResponse
.select(
APIResponse.data['booking']['person'].as_json().alias('person'))
.where(APIResponse.data['meta']['model'] == 'booking')) for result in q:
print(result.person['name'], result.person['dob'])
The BinaryJSONField works the same and supports the same operations as the regular JSONField, but provides several additional operations for testing containment. Using the binary json field, you can test whether your JSON data contains other partial JSON structures (contains(), contains_any(), contains_all()), or whether it is a subset of a larger JSON document (contained_by()).
For more examples, see the JSONField and BinaryJSONField API documents below.
Postgresql hstore is an embedded key/value store. With hstore, you can store arbitrary key/value pairs in your database alongside structured relational data.
To use hstore, you need to specify an additional parameter when instantiating your PostgresqlExtDatabase:
# Specify "register_hstore=True": db = PostgresqlExtDatabase('my_db', register_hstore=True)
Currently the postgres_ext module supports the following operations:
To start with, you will need to import the custom database class and the hstore functions from playhouse.postgres_ext (see above code snippet). Then, it is as simple as adding a HStoreField to your model:
class House(BaseExtModel):
address = CharField()
features = HStoreField()
You can now store arbitrary key/value pairs on House instances:
>>> h = House.create( ... address='123 Main St', ... features={'garage': '2 cars', 'bath': '2 bath'}) ... >>> h_from_db = House.get(House.id == h.id) >>> h_from_db.features {'bath': '2 bath', 'garage': '2 cars'}
You can filter by individual key, multiple keys or partial dictionary:
>>> query = House.select() >>> garage = query.where(House.features.contains('garage')) >>> garage_and_bath = query.where(House.features.contains(['garage', 'bath'])) >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
Suppose you want to do an atomic update to the house:
>>> new_features = House.features.update({'bath': '2.5 bath', 'sqft': '1100'}) >>> query = House.update(features=new_features) >>> query.where(House.id == h.id).execute() 1 >>> h = House.get(House.id == h.id) >>> h.features {'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '1100'}
Or, alternatively an atomic delete:
>>> query = House.update(features=House.features.delete('bath')) >>> query.where(House.id == h.id).execute() 1 >>> h = House.get(House.id == h.id) >>> h.features {'garage': '2 cars', 'sqft': '1100'}
Multiple keys can be deleted at the same time:
>>> query = House.update(features=House.features.delete('garage', 'sqft'))
You can select just keys, just values, or zip the two:
>>> for h in House.select(House.address, House.features.keys().alias('keys')): ... print(h.address, h.keys) 123 Main St [u'bath', u'garage'] >>> for h in House.select(House.address, House.features.values().alias('vals')): ... print(h.address, h.vals) 123 Main St [u'2 bath', u'2 cars'] >>> for h in House.select(House.address, House.features.items().alias('mtx')): ... print(h.address, h.mtx) 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
You can retrieve a slice of data, for example, all the garage data:
>>> query = House.select(House.address, House.features.slice('garage').alias('garage_data')) >>> for house in query: ... print(house.address, house.garage_data) 123 Main St {'garage': '2 cars'}
You can check for the existence of a key and filter rows accordingly:
>>> has_garage = House.features.exists('garage') >>> for house in House.select(House.address, has_garage.alias('has_garage')): ... print(house.address, house.has_garage) 123 Main St True >>> for house in House.select().where(House.features.exists('garage')): ... print(house.address, house.features['garage']) # <-- just houses w/garage data 123 Main St 2 cars
Postgres supports durations through the INTERVAL data-type (docs).
Example:
from datetime import timedelta from playhouse.postgres_ext import * db = PostgresqlExtDatabase('my_db') class Event(Model):
location = CharField()
duration = IntervalField()
start_time = DateTimeField()
class Meta:
database = db
@classmethod
def get_long_meetings(cls):
return cls.select().where(cls.duration > timedelta(hours=1))
When psycopg2 executes a query, normally all results are fetched and returned to the client by the backend. This can cause your application to use a lot of memory when making large queries. Using server-side cursors, results are returned a little at a time (by default 2000 records). For the definitive reference, please see the psycopg2 documentation.
NOTE:
To execute a query using a server-side cursor, simply wrap your select query using the ServerSide() helper:
large_query = PageView.select() # Build query normally. # Iterate over large query inside a transaction. for page_view in ServerSide(large_query):
# do some interesting analysis here.
pass # Server-side resources are released.
If you would like all SELECT queries to automatically use a server-side cursor, you can specify this when creating your PostgresqlExtDatabase:
from postgres_ext import PostgresqlExtDatabase ss_db = PostgresqlExtDatabase('my_db', server_side_cursors=True)
NOTE:
If you are using the ServerSide() helper, the transaction and call to iterator() will be handled transparently.
Postgresql provides sophisticated full-text search using special data-types (tsvector and tsquery). Documents should be stored or converted to the tsvector type, and search queries should be converted to tsquery.
For simple cases, you can simply use the Match() function, which will automatically perform the appropriate conversions, and requires no schema changes:
def blog_search(search_term):
return Blog.select().where(
(Blog.status == Blog.STATUS_PUBLISHED) &
Match(Blog.content, search_term))
The Match() function will automatically convert the left-hand operand to a tsvector, and the right-hand operand to a tsquery. For better performance, it is recommended you create a GIN index on the column you plan to search:
CREATE INDEX blog_full_text_search ON blog USING gin(to_tsvector(content));
Alternatively, you can use the TSVectorField to maintain a dedicated column for storing tsvector data:
class Blog(Model):
content = TextField()
search_content = TSVectorField()
NOTE:
You will need to explicitly convert the incoming text data to tsvector when inserting or updating the search_content field:
content = 'Excellent blog post about peewee ORM.' blog_entry = Blog.create(
content=content,
search_content=fn.to_tsvector(content))
To perform a full-text search, use TSVectorField.match():
terms = 'python & (sqlite | postgres)' results = Blog.select().where(Blog.search_content.match(terms))
For more information, see the Postgres full-text search docs.
If you wish to use the HStore extension, you must specify register_hstore=True.
If using server_side_cursors, also be sure to wrap your queries with ServerSide().
Wrap the given select query in a transaction, and call its iterator() method to avoid caching row instances. In order for the server-side resources to be released, be sure to exhaust the generator (iterate over all the rows).
Usage:
large_query = PageView.select() for page_view in ServerSide(large_query):
# Do something interesting.
pass # At this point server side resources are released.
Field capable of storing arrays of the provided field_class.
NOTE:
You can store and retrieve lists (or lists-of-lists):
class BlogPost(BaseModel):
content = TextField()
tags = ArrayField(CharField) post = BlogPost(content='awesome', tags=['foo', 'bar', 'baz'])
Additionally, you can use the __getitem__ API to query values or slices in the database:
# Get the first tag on a given blog post. first_tag = (BlogPost
.select(BlogPost.tags[0].alias('first_tag'))
.where(BlogPost.id == 1)
.dicts()
.get()) # first_tag = {'first_tag': 'foo'}
Get a slice of values:
# Get the first two tags. two_tags = (BlogPost
.select(BlogPost.tags[:2].alias('two'))
.dicts()
.get()) # two_tags = {'two': ['foo', 'bar']}
# Get all blog posts that are tagged with both "python" and "django". Blog.select().where(Blog.tags.contains('python', 'django'))
Like contains(), except will match rows where the array contains any of the given items.
# Get all blog posts that are tagged with "flask" and/or "django". Blog.select().where(Blog.tags.contains_any('flask', 'django'))
ATTENTION:
NOTE:
>>> for h in House.select(House.address, House.features.keys().alias('keys')): ... print(h.address, h.keys) 123 Main St [u'bath', u'garage']
>>> for h in House.select(House.address, House.features.values().alias('vals')): ... print(h.address, h.vals) 123 Main St [u'2 bath', u'2 cars']
>>> for h in House.select(House.address, House.features.items().alias('mtx')): ... print(h.address, h.mtx) 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
>>> for h in House.select(House.address, House.features.slice('garage').alias('garage_data')): ... print(h.address, h.garage_data) 123 Main St {'garage': '2 cars'}
>>> for h in House.select(House.address, House.features.exists('garage').alias('has_garage')): ... print(h.address, h.has_garage) 123 Main St True >>> for h in House.select().where(House.features.exists('garage')): ... print(h.address, h.features['garage']) # <-- just houses w/garage data 123 Main St 2 cars
>>> query = House.update(features=House.features.update( ... sqft=2000, ... year_built=2012)) >>> query.where(House.id == 1).execute()
NOTE:
>>> query = House.update(features=House.features.delete( ... 'sqft', 'year_built')) >>> query.where(House.id == 1).execute()
Query rows for the existence of either:
>>> query = House.select() >>> has_garage = query.where(House.features.contains('garage')) >>> garage_bath = query.where(House.features.contains(['garage', 'bath'])) >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
Field class suitable for storing and querying arbitrary JSON. When using this on a model, set the field's value to a Python object (either a dict or a list). When you retrieve your value from the database it will be returned as a Python data structure.
NOTE:
NOTE:
Example model declaration:
db = PostgresqlExtDatabase('my_db') class APIResponse(Model):
url = CharField()
response = JSONField()
class Meta:
database = db
Example of storing JSON data:
url = 'http://foo.com/api/resource/' resp = json.loads(urllib2.urlopen(url).read()) APIResponse.create(url=url, response=resp) APIResponse.create(url='http://foo.com/baz/', response={'key': 'value'})
To query, use Python's [] operators to specify nested key or array lookups:
APIResponse.select().where(
APIResponse.response['key1']['nested-key'] == 'some-value')
To illustrate the use of the [] operators, imagine we have the following data stored in an APIResponse:
{
"foo": {
"bar": ["i1", "i2", "i3"],
"baz": {
"huey": "mickey",
"peewee": "nugget"
}
} }
Here are the results of a few queries:
def get_data(expression):
# Helper function to just retrieve the results of a
# particular expression.
query = (APIResponse
.select(expression.alias('my_data'))
.dicts()
.get())
return query['my_data'] # Accessing the foo -> bar subkey will return a JSON # representation of the list. get_data(APIResponse.data['foo']['bar']) # '["i1", "i2", "i3"]' # In order to retrieve this list as a Python list, # we will call .as_json() on the expression. get_data(APIResponse.data['foo']['bar'].as_json()) # ['i1', 'i2', 'i3'] # Similarly, accessing the foo -> baz subkey will # return a JSON representation of the dictionary. get_data(APIResponse.data['foo']['baz']) # '{"huey": "mickey", "peewee": "nugget"}' # Again, calling .as_json() will return an actual # python dictionary. get_data(APIResponse.data['foo']['baz'].as_json()) # {'huey': 'mickey', 'peewee': 'nugget'} # When dealing with simple values, either way works as # you expect. get_data(APIResponse.data['foo']['bar'][0]) # 'i1' # Calling .as_json() when the result is a simple value # will return the same thing as the previous example. get_data(APIResponse.data['foo']['bar'][0].as_json()) # 'i1'
Store and query arbitrary JSON documents. Data should be stored using normal Python dict and list objects, and when data is returned from the database, it will be returned using dict and list as well.
For examples of basic query operations, see the above code samples for JSONField. The example queries below will use the same APIResponse model described above.
NOTE:
NOTE:
Example:
search_fragment = {
'foo': {'bar': ['i2']} } query = (APIResponse
.select()
.where(APIResponse.data.contains(search_fragment))) # If we're searching for a list, the list items do not need to # be ordered in a particular way: query = (APIResponse
.select()
.where(APIResponse.data.contains({
'foo': {'bar': ['i2', 'i1']}})))
We can pass in simple keys as well. To find APIResponses that contain the key foo at the top-level:
APIResponse.select().where(APIResponse.data.contains('foo'))
We can also search sub-keys using square-brackets:
APIResponse.select().where(
APIResponse.data['foo']['bar'].contains(['i2', 'i1']))
APIResponse.select().where(
APIResponse.data.contains_any('foo', 'baz', 'nugget'))
Like contains(), we can also search sub-keys:
APIResponse.select().where(
APIResponse.data['foo']['bar'].contains_any('i2', 'ix'))
APIResponse.select().where(
APIResponse.data.contains_all('foo'))
Like contains_any(), we can also search sub-keys:
APIResponse.select().where(
APIResponse.data['foo']['bar'].contains_all('i1', 'i2', 'i3'))
big_doc = {
'foo': {
'bar': ['i1', 'i2', 'i3'],
'baz': {
'huey': 'mickey',
'peewee': 'nugget',
}
},
'other_key': ['nugget', 'bear', 'kitten'], } APIResponse.select().where(
APIResponse.data.contained_by(big_doc))
Example:
def blog_search(search_term):
return Blog.select().where(
(Blog.status == Blog.STATUS_PUBLISHED) &
Match(Blog.content, search_term))
NOTE:
NOTE:
Example usage:
class Blog(Model):
content = TextField()
search_content = TSVectorField() content = 'this is a sample blog entry.' blog_entry = Blog.create(
content=content,
search_content=fn.to_tsvector(content)) # Note `to_tsvector()`.
Example:
# Perform a search using the "match" method. terms = 'python & (sqlite | postgres)' results = Blog.select().where(Blog.search_content.match(terms))
CockroachDB (CRDB) is well supported by peewee.
from playhouse.cockroachdb import CockroachDatabase db = CockroachDatabase('my_app', user='root', host='10.1.0.8')
The playhouse.cockroachdb extension module provides the following classes and helpers:
Special field-types that may be useful when using CRDB:
CRDB is compatible with Postgres' wire protocol and exposes a very similar SQL interface, so it is possible (though not recommended) to use PostgresqlDatabase with CRDB:
CRDB does not support nested transactions (savepoints), so the atomic() method on the CockroachDatabase has been modified to raise an exception if an invalid nesting is encountered. If you would like to be able to nest transactional code, you can use the transaction() method, which will ensure that the outer-most block will manage the transaction (e.g., exiting a nested-block will not cause an early commit).
Example:
@db.transaction() def create_user(username):
return User.create(username=username) def some_other_function():
with db.transaction() as txn:
# do some stuff...
# This function is wrapped in a transaction, but the nested
# transaction will be ignored and folded into the outer
# transaction, as we are already in a wrapped-block (via the
# context manager).
create_user('some_user@example.com')
# do other stuff.
# At this point we have exited the outer-most block and the transaction
# will be committed.
return
CRDB provides client-side transaction retries, which are available using a special run_transaction() helper. This helper method accepts a callable, which is responsible for executing any transactional statements that may need to be retried.
Simplest possible example of run_transaction():
def create_user(email):
# Callable that accepts a single argument (the database instance) and
# which is responsible for executing the transactional SQL.
def callback(db_ref):
return User.create(email=email)
return db.run_transaction(callback, max_attempts=10) huey = create_user('huey@example.com')
NOTE:
Example of using run_transaction() to implement client-side retries for a transaction that transfers an amount from one account to another:
from playhouse.cockroachdb import CockroachDatabase db = CockroachDatabase('my_app') def transfer_funds(from_id, to_id, amt):
"""
Returns a 3-tuple of (success?, from balance, to balance). If there are
not sufficient funds, then the original balances are returned.
"""
def thunk(db_ref):
src, dest = (Account
.select()
.where(Account.id.in_([from_id, to_id])))
if src.id != from_id:
src, dest = dest, src # Swap order.
# Cannot perform transfer, insufficient funds!
if src.balance < amt:
return False, src.balance, dest.balance
# Update each account, returning the new balance.
src, = (Account
.update(balance=Account.balance - amt)
.where(Account.id == from_id)
.returning(Account.balance)
.execute())
dest, = (Account
.update(balance=Account.balance + amt)
.where(Account.id == to_id)
.returning(Account.balance)
.execute())
return True, src.balance, dest.balance
# Perform the queries that comprise a logical transaction. In the
# event the transaction fails due to contention, it will be auto-
# matically retried (up to 10 times).
return db.run_transaction(thunk, max_attempts=10)
Additional keyword arguments are passed to the psycopg2 connection constructor, and may be used to specify the database user, port, etc.
Run SQL in a transaction with automatic client-side retries.
User-provided callback:
Additionally, the database must not have any open transactions at the time this function is called, as CRDB does not support nested transactions. Attempting to do so will raise a NotImplementedError.
Simplest possible example:
def create_user(email):
def callback(db_ref):
return User.create(email=email)
return db.run_transaction(callback, max_attempts=10) user = create_user('huey@example.com')
NOTE:
See also:
Peewee provides an alternate database implementation for using the mysql-connector driver. The implementation can be found in playhouse.mysql_ext.
Example usage:
from playhouse.mysql_ext import MySQLConnectorDatabase # MySQL database implementation that utilizes mysql-connector driver. db = MySQLConnectorDatabase('my_database', host='1.2.3.4', user='mysql')
Additional MySQL-specific helpers:
Helper class for constructing MySQL full-text search queries of the form:
MATCH (columns, ...) AGAINST (expr[ modifier])
The dataset module contains a high-level API for working with databases modeled after the popular project of the same name. The aims of the dataset module are to provide:
A minimal data-loading script might look like this:
from playhouse.dataset import DataSet db = DataSet('sqlite:///:memory:') table = db['sometable'] table.insert(name='Huey', age=3) table.insert(name='Mickey', age=5, gender='male') huey = table.find_one(name='Huey') print(huey) # {'age': 3, 'gender': None, 'id': 1, 'name': 'Huey'} for obj in table:
print(obj) # {'age': 3, 'gender': None, 'id': 1, 'name': 'Huey'} # {'age': 5, 'gender': 'male', 'id': 2, 'name': 'Mickey'}
You can insert, update or delete using the dictionary APIs as well:
huey = table.find_one(name='Huey') # {'age': 3, 'gender': None, 'id': 1, 'name': 'Huey'} # Perform an update by supplying a partial record of changes. table[1] = {'gender': 'male', 'age': 4} print(table[1]) # {'age': 4, 'gender': 'male', 'id': 1, 'name': 'Huey'} # Or insert a new record: table[3] = {'name': 'Zaizee', 'age': 2} print(table[3]) # {'age': 2, 'gender': None, 'id': 3, 'name': 'Zaizee'} # Or delete a record: del table[3] # Remove the row we just added.
You can export or import data using freeze() and thaw():
# Export table content to the `users.json` file. db.freeze(table.all(), format='json', filename='users.json') # Import data from a CSV file into a new table. Columns will be automatically # created for each field in the CSV file. new_table = db['stats'] new_table.thaw(format='csv', filename='monthly_stats.csv')
DataSet objects are initialized by passing in a database URL of the format dialect://user:password@host/dbname. See the Database URL section for examples of connecting to various databases.
# Create an in-memory SQLite database. db = DataSet('sqlite:///:memory:')
To store data, we must first obtain a reference to a table. If the table does not exist, it will be created automatically:
# Get a table reference, creating the table if it does not exist. table = db['users']
We can now insert() new rows into the table. If the columns do not exist, they will be created automatically:
table.insert(name='Huey', age=3, color='white') table.insert(name='Mickey', age=5, gender='male')
To update existing entries in the table, pass in a dictionary containing the new values and filter conditions. The list of columns to use as filters is specified in the columns argument. If no filter columns are specified, then all rows will be updated.
# Update the gender for "Huey". table.update(name='Huey', gender='male', columns=['name']) # Update all records. If the column does not exist, it will be created. table.update(favorite_orm='peewee')
To import data from an external source, such as a JSON or CSV file, you can use the thaw() method. By default, new columns will be created for any attributes encountered. If you wish to only populate columns that are already defined on a table, you can pass in strict=True.
# Load data from a JSON file containing a list of objects. table = dataset['stock_prices'] table.thaw(filename='stocks.json', format='json') table.all()[:3] # Might print... [{'id': 1, 'ticker': 'GOOG', 'price': 703},
{'id': 2, 'ticker': 'AAPL', 'price': 109},
{'id': 3, 'ticker': 'AMZN', 'price': 300}]
DataSet supports nesting transactions using a simple context manager.
table = db['users'] with db.transaction() as txn:
table.insert(name='Charlie')
with db.transaction() as nested_txn:
# Set Charlie's favorite ORM to Django.
table.update(name='Charlie', favorite_orm='django', columns=['name'])
# jk/lol
nested_txn.rollback()
You can use the tables() method to list the tables in the current database:
>>> print db.tables ['sometable', 'user']
And for a given table, you can print the columns:
>>> table = db['user'] >>> print table.columns ['id', 'age', 'name', 'gender', 'favorite_orm']
We can also find out how many rows are in a table:
>>> print len(db['user']) 3
To retrieve all rows, you can use the all() method:
# Retrieve all the users. users = db['user'].all() # We can iterate over all rows without calling `.all()` for user in db['user']:
print user['name']
Specific objects can be retrieved using find() and find_one().
# Find all the users who like peewee. peewee_users = db['user'].find(favorite_orm='peewee') # Find Huey. huey = db['user'].find_one(name='Huey')
To export data, use the freeze() method, passing in the query you wish to export:
peewee_users = db['user'].find(favorite_orm='peewee') db.freeze(peewee_users, format='json', filename='peewee_users.json')
The DataSet class provides a high-level API for working with relational databases.
Execute the provided query against the database.
Provides a high-level API for working with rows in a given table.
# Create a unique index on the `username` column. db['users'].create_index(['username'], unique=True)
# Update all rows. db['users'].update(favorite_orm='peewee') # Only update Huey's record, setting his age to 3. db['users'].update(name='Huey', age=3, columns=['name'])
peewee_users = db['users'].find(favorite_orm='peewee')
huey = db['users'].find_one(name='Huey')
# Adios, Django! db['users'].delete(favorite_orm='Django') # Delete all the secret messages. db['secret_messages'].delete()
These fields can be found in the playhouse.fields module.
Stores compressed data using the specified algorithm. This field extends BlobField, transparently storing a compressed representation of the data in the database.
Hybrid attributes encapsulate functionality that operates at both the Python and SQL levels. The idea for hybrid attributes comes from a feature of the same name in SQLAlchemy. Consider the following example:
class Interval(Model):
start = IntegerField()
end = IntegerField()
@hybrid_property
def length(self):
return self.end - self.start
@hybrid_method
def contains(self, point):
return (self.start <= point) & (point < self.end)
The hybrid attribute gets its name from the fact that the length attribute will behave differently depending on whether it is accessed via the Interval class or an Interval instance.
If accessed via an instance, then it behaves just as you would expect.
If accessed via the Interval.length class attribute, however, the length calculation will be expressed as a SQL expression. For example:
query = Interval.select().where(Interval.length > 5)
This query will be equivalent to the following SQL:
SELECT "t1"."id", "t1"."start", "t1"."end" FROM "interval" AS t1 WHERE (("t1"."end" - "t1"."start") > 5)
The playhouse.hybrid module also contains a decorator for implementing hybrid methods which can accept parameters. As with hybrid properties, when accessed via a model instance, then the function executes normally as-written. When the hybrid method is called on the class, however, it will generate a SQL expression.
Example:
query = Interval.select().where(Interval.contains(2))
This query is equivalent to the following SQL:
SELECT "t1"."id", "t1"."start", "t1"."end" FROM "interval" AS t1 WHERE (("t1"."start" <= 2) AND (2 < "t1"."end"))
There is an additional API for situations where the python implementation differs slightly from the SQL implementation. Let's add a radius method to the Interval model. Because this method calculates an absolute value, we will use the Python abs() function for the instance portion and the fn.ABS() SQL function for the class portion.
class Interval(Model):
start = IntegerField()
end = IntegerField()
@hybrid_property
def length(self):
return self.end - self.start
@hybrid_property
def radius(self):
return abs(self.length) / 2
@radius.expression
def radius(cls):
return fn.ABS(cls.length) / 2
What is neat is that both the radius implementations refer to the length hybrid attribute! When accessed via an Interval instance, the radius calculation will be executed in Python. When invoked via an Interval class, we will get the appropriate SQL.
Example:
query = Interval.select().where(Interval.radius < 3)
This query is equivalent to the following SQL:
SELECT "t1"."id", "t1"."start", "t1"."end" FROM "interval" AS t1 WHERE ((abs("t1"."end" - "t1"."start") / 2) < 3)
Pretty neat, right? Thanks for the cool idea, SQLAlchemy!
Example:
class Interval(Model):
start = IntegerField()
end = IntegerField()
@hybrid_method
def contains(self, point):
return (self.start <= point) & (point < self.end)
When called with an Interval instance, the contains method will behave as you would expect. When called as a classmethod, though, a SQL expression will be generated:
query = Interval.select().where(Interval.contains(2))
Would generate the following SQL:
SELECT "t1"."id", "t1"."start", "t1"."end" FROM "interval" AS t1 WHERE (("t1"."start" <= 2) AND (2 < "t1"."end"))
Examples:
class Interval(Model):
start = IntegerField()
end = IntegerField()
@hybrid_property
def length(self):
return self.end - self.start
@hybrid_property
def radius(self):
return abs(self.length) / 2
@radius.expression
def radius(cls):
return fn.ABS(cls.length) / 2
When accessed on an Interval instance, the length and radius properties will behave as you would expect. When accessed as class attributes, though, a SQL expression will be generated instead:
query = (Interval
.select()
.where(
(Interval.length > 6) &
(Interval.radius >= 3)))
Would generate the following SQL:
SELECT "t1"."id", "t1"."start", "t1"."end" FROM "interval" AS t1 WHERE (
(("t1"."end" - "t1"."start") > 6) AND
((abs("t1"."end" - "t1"."start") / 2) >= 3) )
The playhouse.kv module contains the implementation of a persistent dictionary.
Dictionary-like API for storing key/value data. Like dictionaries, supports the expected APIs, but also has the added capability of accepting expressions for getting, setting and deleting items.
Table is created automatically (if it doesn't exist) when the KeyValue is instantiated.
Uses efficient upsert implementation for setting and updating/overwriting key/value pairs.
Basic examples:
# Create a key/value store, which uses an in-memory SQLite database # for data storage. KV = KeyValue() # Set (or overwrite) the value for "k1". KV['k1'] = 'v1' # Set (or update) multiple keys at once (uses an efficient upsert). KV.update(k2='v2', k3='v3') # Getting values works as you'd expect. assert KV['k2'] == 'v2' # We can also do this: for value in KV[KV.key > 'k1']:
print(value) # 'v2' # 'v3' # Update multiple values at once using expression: KV[KV.key > 'k1'] = 'vx' # What's stored in the KV? print(dict(KV)) # {'k1': 'v1', 'k2': 'vx', 'k3': 'vx'} # Delete a single item. del KV['k2'] # How many items are stored in the KV? print(len(KV)) # 2 # Delete items that match the given condition. del KV[KV.key > 'k1']
Example:
>>> kv = KeyValue() >>> kv.update(k1='v1', k2='v2') >>> 'k1' in kv True >>> 'kx' in kv False >>> (KV.key < 'k2') in KV True >>> (KV.key > 'k2') in KV False
Examples:
>>> KV = KeyValue() >>> KV.update(k1='v1', k2='v2', k3='v3') >>> KV['k1'] 'v1' >>> KV['kx'] KeyError: "kx" not found >>> KV[KV.key > 'k1'] ['v2', 'v3'] >>> KV[KV.key < 'k1'] []
Set value for the given key. If expr is an expression, then any keys matching the expression will have their value updated.
Example:
>>> KV = KeyValue() >>> KV.update(k1='v1', k2='v2', k3='v3') >>> KV['k1'] = 'v1-x' >>> print(KV['k1']) 'v1-x' >>> KV[KV.key >= 'k2'] = 'v99' >>> dict(KV) {'k1': 'v1-x', 'k2': 'v99', 'k3': 'v99'}
Delete the given key. If an expression is given, delete all keys that match the expression.
Example:
>>> KV = KeyValue() >>> KV.update(k1=1, k2=2, k3=3) >>> del KV['k1'] # Deletes "k1". >>> del KV['k1'] KeyError: "k1" does not exist >>> del KV[KV.key > 'k2'] # Deletes "k3". >>> del KV[KV.key > 'k99'] # Nothing deleted, no keys match.
Example:
>>> KV = KeyValue() >>> KV.update(k1=1, k2=2) # Sets 'k1'=1, 'k2'=2. >>> dict(KV) {'k1': 1, 'k2': 2} >>> KV.update(k2=22, k3=3) # Updates 'k2'->22, sets 'k3'=3. >>> dict(KV) {'k1': 1, 'k2': 22, 'k3': 3} >>> KV.update({'k2': -2, 'k4': 4}) # Also can pass a dictionary. >>> dict(KV) {'k1': 1, 'k2': -2, 'k3': 3, 'k4': 4}
Get the value at the given key. If the key does not exist, the default value is returned, unless the key is an expression in which case an empty list will be returned.
Get value and delete the given key. If the key does not exist, the default value is returned, unless the key is an expression in which case an empty list is returned.
This module contains helper functions for expressing things that would otherwise be somewhat verbose or cumbersome using peewee's APIs. There are also helpers for serializing models to dictionaries and vice-versa.
Convert a model instance (and optionally any related instances) to a dictionary.
Examples:
>>> user = User.create(username='charlie') >>> model_to_dict(user) {'id': 1, 'username': 'charlie'} >>> model_to_dict(user, backrefs=True) {'id': 1, 'tweets': [], 'username': 'charlie'} >>> t1 = Tweet.create(user=user, message='tweet-1') >>> t2 = Tweet.create(user=user, message='tweet-2') >>> model_to_dict(user, backrefs=True) {
'id': 1,
'tweets': [
{'id': 1, 'message': 'tweet-1'},
{'id': 2, 'message': 'tweet-2'},
],
'username': 'charlie' } >>> model_to_dict(t1) {
'id': 1,
'message': 'tweet-1',
'user': {
'id': 1,
'username': 'charlie'
} } >>> model_to_dict(t2, recurse=False) {'id': 1, 'message': 'tweet-2', 'user': 1}
The implementation of model_to_dict is fairly complex, owing to the various usages it attempts to support. If you have a special usage, I strongly advise that you do not attempt to shoe-horn some crazy combination of parameters into this function. Just write a simple function that accomplishes exactly what you're attempting to do.
Convert a dictionary of data to a model instance, creating related instances where appropriate.
Examples:
>>> user_data = {'id': 1, 'username': 'charlie'} >>> user = dict_to_model(User, user_data) >>> user <__main__.User at 0x7fea8fa4d490> >>> user.username 'charlie' >>> note_data = {'id': 2, 'text': 'note text', 'user': user_data} >>> note = dict_to_model(Note, note_data) >>> note.text 'note text' >>> note.user.username 'charlie' >>> user_with_notes = { ... 'id': 1, ... 'username': 'charlie', ... 'notes': [{'id': 1, 'text': 'note-1'}, {'id': 2, 'text': 'note-2'}]} >>> user = dict_to_model(User, user_with_notes) >>> user.notes[0].text 'note-1' >>> user.notes[0].user.username 'charlie'
Update a model instance with the given data dictionary.
Helper for resolving rows returned in a compound select query to the correct model instance type. For example, if you have a union of two different tables, this helper will resolve each row to the proper model when iterating over the query results.
Models with hooks for signals (a-la django) are provided in playhouse.signals. To use the signals, you will need all of your project's models to be a subclass of playhouse.signals.Model, which overrides the necessary methods to provide support for the various signals.
from playhouse.signals import Model, post_save class MyModel(Model):
data = IntegerField() @post_save(sender=MyModel) def on_save_handler(model_class, instance, created):
put_data_in_cache(instance.data)
WARNING:
Signals work by hooking into the higher-level peewee APIs like Model.save() and Model.delete_instance(), where the affected model instance is known ahead of time.
The following signals are provided:
Whenever a signal is dispatched, it will call any handlers that have been registered. This allows totally separate code to respond to events like model save and delete.
The Signal class provides a connect() method, which takes a callback function and two optional parameters for "sender" and "name". If specified, the "sender" parameter should be a single model class and allows your callback to only receive signals from that one model class. The "name" parameter is used as a convenient alias in the event you wish to unregister your signal handler.
Example usage:
from playhouse.signals import * def post_save_handler(sender, instance, created):
print '%s was just saved' % instance # our handler will only be called when we save instances of SomeModel post_save.connect(post_save_handler, sender=SomeModel)
All signal handlers accept as their first two arguments sender and instance, where sender is the model class and instance is the actual model being acted upon.
If you'd like, you can also use a decorator to connect signal handlers. This is functionally equivalent to the above example:
@post_save(sender=SomeModel) def post_save_handler(sender, instance, created):
print '%s was just saved' % instance
Add the receiver to the internal list of receivers, which will be called whenever the signal is sent.
from playhouse.signals import post_save from project.handlers import cache_buster post_save.connect(cache_buster, name='project.cache_buster')
Disconnect the given receiver (or the receiver with the given name alias) so that it no longer is called. Either the receiver or the name must be provided.
post_save.disconnect(name='project.cache_buster')
Iterates over the receivers and will call them in the order in which they were connected. If the receiver specified a sender, it will only be called if the instance is an instance of the sender.
pwiz is a little script that ships with peewee and is capable of introspecting an existing database and generating model code suitable for interacting with the underlying data. If you have a database already, pwiz can give you a nice boost by generating skeleton code with correct column affinities and foreign keys.
If you install peewee using setup.py install, pwiz will be installed as a "script" and you can just run:
python -m pwiz -e postgresql -u postgres my_postgres_db
This will print a bunch of models to standard output. So you can do this:
python -m pwiz -e postgresql my_postgres_db > mymodels.py python # <-- fire up an interactive shell
>>> from mymodels import Blog, Entry, Tag, Whatever >>> print [blog.name for blog in Blog.select()]
pwiz accepts the following command-line options:
Option | Meaning | Example |
-h | show help | |
-e | database backend | -e mysql |
-H | host to connect to | -H remote.db.server |
-p | port to connect on | -p 9001 |
-u | database user | -u postgres |
-P | database password | -P (will be prompted for password) |
-s | schema | -s public |
-t | tables to generate | -t tweet,users,relationships |
-v | generate models for VIEWs | (no argument) |
-i | add info metadata to generated file | (no argument) |
-o | table column order is preserved | (no argument) |
The following are valid parameters for the engine (-e):
WARNING:
The password will be included in the output. Specifically, at the top of the file a Database will be defined along with any required parameters -- including the password.
Examples of introspecting various databases:
# Introspect a Sqlite database. python -m pwiz -e sqlite path/to/sqlite_database.db # Introspect a MySQL database, logging in as root. You will be prompted # for a password ("-P"). python -m pwiz -e mysql -u root -P mysql_db_name # Introspect a Postgresql database on a remote server. python -m pwiz -e postgres -u postgres -H 10.1.0.3 pg_db_name
Full example:
$ sqlite3 example.db << EOM CREATE TABLE "user" ("id" INTEGER NOT NULL PRIMARY KEY, "username" TEXT NOT NULL); CREATE TABLE "tweet" (
"id" INTEGER NOT NULL PRIMARY KEY,
"content" TEXT NOT NULL,
"timestamp" DATETIME NOT NULL,
"user_id" INTEGER NOT NULL,
FOREIGN KEY ("user_id") REFERENCES "user" ("id")); CREATE UNIQUE INDEX "user_username" ON "user" ("username"); EOM $ python -m pwiz -e sqlite example.db
Produces the following output:
from peewee import * database = SqliteDatabase('example.db', **{}) class UnknownField(object):
def __init__(self, *_, **__): pass class BaseModel(Model):
class Meta:
database = database class User(BaseModel):
username = TextField(unique=True)
class Meta:
table_name = 'user' class Tweet(BaseModel):
content = TextField()
timestamp = DateTimeField()
user = ForeignKeyField(column_name='user_id', field='id', model=User)
class Meta:
table_name = 'tweet'
Observations:
NOTE:
Peewee now supports schema migrations, with well-tested support for Postgresql, SQLite and MySQL. Unlike other schema migration tools, peewee's migrations do not handle introspection and database "versioning". Rather, peewee provides a number of helper functions for generating and running schema-altering statements. This engine provides the basis on which a more sophisticated tool could some day be built.
Migrations can be written as simple python scripts and executed from the command-line. Since the migrations only depend on your applications Database object, it should be easy to manage changing your model definitions and maintaining a set of migration scripts without introducing dependencies.
Begin by importing the helpers from the migrate module:
from playhouse.migrate import *
Instantiate a migrator. The SchemaMigrator class is responsible for generating schema altering operations, which can then be run sequentially by the migrate() helper.
# Postgres example: my_db = PostgresqlDatabase(...) migrator = PostgresqlMigrator(my_db) # SQLite example: my_db = SqliteDatabase('my_database.db') migrator = SqliteMigrator(my_db)
Use migrate() to execute one or more operations:
title_field = CharField(default='') status_field = IntegerField(null=True) migrate(
migrator.add_column('some_table', 'title', title_field),
migrator.add_column('some_table', 'status', status_field),
migrator.drop_column('some_table', 'old_column'), )
WARNING:
with my_db.atomic():
migrate(...)
Add new field(s) to an existing model:
# Create your field instances. For non-null fields you must specify a # default value. pubdate_field = DateTimeField(null=True) comment_field = TextField(default='') # Run the migration, specifying the database table, field name and field. migrate(
migrator.add_column('comment_tbl', 'pub_date', pubdate_field),
migrator.add_column('comment_tbl', 'comment', comment_field), )
Renaming a field:
# Specify the table, original name of the column, and its new name. migrate(
migrator.rename_column('story', 'pub_date', 'publish_date'),
migrator.rename_column('story', 'mod_date', 'modified_date'), )
Dropping a field:
migrate(
migrator.drop_column('story', 'some_old_field'), )
Making a field nullable or not nullable:
# Note that when making a field not null that field must not have any # NULL values present. migrate(
# Make `pub_date` allow NULL values.
migrator.drop_not_null('story', 'pub_date'),
# Prevent `modified_date` from containing NULL values.
migrator.add_not_null('story', 'modified_date'), )
Altering a field's data-type:
# Change a VARCHAR(50) field to a TEXT field. migrate(
migrator.alter_column_type('person', 'email', TextField()) )
Renaming a table:
migrate(
migrator.rename_table('story', 'stories_tbl'), )
Adding an index:
# Specify the table, column names, and whether the index should be # UNIQUE or not. migrate(
# Create an index on the `pub_date` column.
migrator.add_index('story', ('pub_date',), False),
# Create a multi-column index on the `pub_date` and `status` fields.
migrator.add_index('story', ('pub_date', 'status'), False),
# Create a unique index on the category and title fields.
migrator.add_index('story', ('category_id', 'title'), True), )
Dropping an index:
# Specify the index name. migrate(migrator.drop_index('story', 'story_pub_date_status'))
Adding or dropping table constraints:
# Add a CHECK() constraint to enforce the price cannot be negative. migrate(migrator.add_constraint(
'products',
'price_check',
Check('price >= 0'))) # Remove the price check constraint. migrate(migrator.drop_constraint('products', 'price_check')) # Add a UNIQUE constraint on the first and last names. migrate(migrator.add_unique('person', 'first_name', 'last_name'))
NOTE:
new_field = TextField(default='', null=False) migrator = PostgresqlMigrator(db) migrate(migrator.set_search_path('my_schema_name'),
migrator.add_column('table', 'field_name', new_field))
Usage:
migrate(
migrator.add_column('some_table', 'new_column', CharField(default='')),
migrator.create_index('some_table', ('new_column',)), )
The SchemaMigrator is responsible for generating schema-altering statements.
Add a new column to the provided table. The field provided will be used to generate the appropriate column definition.
NOTE:
NOTE:
Alter the data-type of a column. This method should be used with care, as using incompatible types may not be well-supported by your database.
Set the search path (schema) for the subsequent operations.
SQLite has limited support for ALTER TABLE queries, so the following operations are currently not supported for SQLite:
The reflection module contains helpers for introspecting existing databases. This module is used internally by several other modules in the playhouse, including DataSet and pwiz, a model generator.
Generate models for the tables in the given database. For an example of how to use this function, see the section interactive.
Example:
>>> from peewee import * >>> from playhouse.reflection import generate_models >>> db = PostgresqlDatabase('my_app') >>> models = generate_models(db) >>> list(models.keys()) ['account', 'customer', 'order', 'orderitem', 'product'] >>> globals().update(models) # Inject models into namespace. >>> for cust in customer.select(): # Query using generated model. ... print(cust.name) ... Huey Kitty Mickey Dog
Print a user-friendly description of a model class, useful for debugging or interactive use. Currently this prints the table name, and all fields along with their data-types. The interactive section contains an example.
Example output:
>>> from playhouse.reflection import print_model >>> print_model(User) user
id AUTO PK
email TEXT
name TEXT
dob DATE index(es)
email UNIQUE >>> print_model(Tweet) tweet
id AUTO PK
user INT FK: User.id
title TEXT
content TEXT
timestamp DATETIME
is_published BOOL index(es)
user_id
is_published, timestamp
Prints the SQL CREATE TABLE for the given model class, which may be useful for debugging or interactive use. See the interactive section for example usage. Note that indexes and constraints are not included in the output of this function.
Example output:
>>> from playhouse.reflection import print_table_sql >>> print_table_sql(User) CREATE TABLE IF NOT EXISTS "user" (
"id" INTEGER NOT NULL PRIMARY KEY,
"email" TEXT NOT NULL,
"name" TEXT NOT NULL,
"dob" DATE NOT NULL ) >>> print_table_sql(Tweet) CREATE TABLE IF NOT EXISTS "tweet" (
"id" INTEGER NOT NULL PRIMARY KEY,
"user_id" INTEGER NOT NULL,
"title" TEXT NOT NULL,
"content" TEXT NOT NULL,
"timestamp" DATETIME NOT NULL,
"is_published" INTEGER NOT NULL,
FOREIGN KEY ("user_id") REFERENCES "user" ("id") )
Creates an Introspector instance suitable for use with the given database.
Usage:
db = SqliteDatabase('my_app.db') introspector = Introspector.from_database(db) models = introspector.generate_models() # User and Tweet (assumed to exist in the database) are # peewee Model classes generated from the database schema. User = models['user'] Tweet = models['tweet']
Introspect the database, reading in the tables, columns, and foreign key constraints, then generate a dictionary mapping each database table to a dynamically-generated Model class.
This module contains a helper function to generate a database connection from a URL connection string.
Examples:
Supported schemes:
Usage:
import os from playhouse.db_url import connect # Connect to the database URL defined in the environment, falling # back to a local Sqlite database if no database URL is specified. db = connect(os.environ.get('DATABASE') or 'sqlite:///default.db')
If you are using a custom database class, you can use the parse() function to extract information from a URL which can then be passed in to your database object.
Register additional database class under the specified names. This function can be used to extend the connect() function to support additional schemes. Suppose you have a custom database class for Firebird named FirebirdDatabase.
from playhouse.db_url import connect, register_database register_database(FirebirdDatabase, 'firebird') db = connect('firebird://my-firebird-db')
The pool module contains a number of Database classes that provide connection pooling for PostgreSQL, MySQL and SQLite databases. The pool works by overriding the methods on the Database class that open and close connections to the backend. The pool can specify a timeout after which connections are recycled, as well as an upper bound on the number of open connections.
In a multi-threaded application, up to max_connections will be opened. Each thread (or, if using gevent, greenlet) will have its own connection.
In a single-threaded application, only one connection will be created. It will be continually recycled until either it exceeds the stale timeout or is closed explicitly (using .manual_close()).
By default, all your application needs to do is ensure that connections are closed when you are finished with them, and they will be returned to the pool. For web applications, this typically means that at the beginning of a request, you will open a connection, and when you return a response, you will close the connection.
Simple Postgres pool example code:
# Use the special postgresql extensions. from playhouse.pool import PooledPostgresqlExtDatabase db = PooledPostgresqlExtDatabase(
'my_app',
max_connections=32,
stale_timeout=300, # 5 minutes.
user='postgres') class BaseModel(Model):
class Meta:
database = db
That's it! If you would like finer-grained control over the pool of connections, check out the advanced_connection_management section.
Mixin class intended to be used with a subclass of Database.
NOTE:
NOTE:
Close connections which are in-use but exceed the given age. Use caution when calling this method!
Contains utilities helpful when testing peewee projects.
with count_queries() as counter:
huey = User.get(User.username == 'huey')
huey_tweets = [tweet.message for tweet in huey.tweets] assert counter.count == 2
class TestMyApp(unittest.TestCase):
@assert_query_count(1)
def test_get_popular_blogs(self):
popular_blogs = Blog.get_popular()
self.assertEqual(
[blog.title for blog in popular_blogs],
["Peewee's Playhouse!", "All About Huey", "Mickey's Adventures"])
This function can also be used as a context manager:
class TestMyApp(unittest.TestCase):
def test_expensive_operation(self):
with assert_query_count(1):
perform_expensive_operation()
The playhouse.flask_utils module contains several helpers for integrating peewee with the Flask web framework.
The FlaskDB class is a wrapper for configuring and referencing a Peewee database from within a Flask application. Don't let its name fool you: it is not the same thing as a peewee database. FlaskDB is designed to remove the following boilerplate from your flask app:
Basic usage:
import datetime from flask import Flask from peewee import * from playhouse.flask_utils import FlaskDB DATABASE = 'postgresql://postgres:password@localhost:5432/my_database' app = Flask(__name__) app.config.from_object(__name__) db_wrapper = FlaskDB(app) class User(db_wrapper.Model):
username = CharField(unique=True) class Tweet(db_wrapper.Model):
user = ForeignKeyField(User, backref='tweets')
content = TextField()
timestamp = DateTimeField(default=datetime.datetime.now)
The above code example will create and instantiate a peewee PostgresqlDatabase specified by the given database URL. Request hooks will be configured to establish a connection when a request is received, and automatically close the connection when the response is sent. Lastly, the FlaskDB class exposes a FlaskDB.Model property which can be used as a base for your application's models.
Here is how you can access the wrapped Peewee database instance that is configured for you by the FlaskDB wrapper:
# Obtain a reference to the Peewee database instance. peewee_db = db_wrapper.database @app.route('/transfer-funds/', methods=['POST']) def transfer_funds():
with peewee_db.atomic():
# ...
return jsonify({'transfer-id': xid})
NOTE:
Here is another way to configure a Peewee database using FlaskDB:
app = Flask(__name__) db_wrapper = FlaskDB(app, 'sqlite:///my_app.db')
While the above examples show using a database URL, for more advanced usages you can specify a dictionary of configuration options, or simply pass in a peewee Database instance:
DATABASE = {
'name': 'my_app_db',
'engine': 'playhouse.pool.PooledPostgresqlDatabase',
'user': 'postgres',
'max_connections': 32,
'stale_timeout': 600, } app = Flask(__name__) app.config.from_object(__name__) wrapper = FlaskDB(app) pooled_postgres_db = wrapper.database
Using a peewee Database object:
peewee_db = PostgresqlExtDatabase('my_app') app = Flask(__name__) db_wrapper = FlaskDB(app, peewee_db)
If you prefer to use the application factory pattern, the FlaskDB class implements an init_app() method.
Using as a factory:
db_wrapper = FlaskDB() # Even though the database is not yet initialized, you can still use the # `Model` property to create model classes. class User(db_wrapper.Model):
username = CharField(unique=True) def create_app():
app = Flask(__name__)
app.config['DATABASE'] = 'sqlite:////home/code/apps/my-database.db'
db_wrapper.init_app(app)
return app
The flask_utils module provides several helpers for managing queries in your web app. Some common patterns include:
Retrieve the object matching the given query, or return a 404 not found response. A common use-case might be a detail page for a weblog. You want to either retrieve the post matching the given URL, or return a 404.
Example:
@app.route('/blog/<slug>/') def post_detail(slug):
public_posts = Post.select().where(Post.published == True)
post = get_object_or_404(public_posts, (Post.slug == slug))
return render_template('post_detail.html', post=post)
Retrieve a paginated list of objects specified by the given query. The paginated object list will be dropped into the context using the given context_variable, as well as metadata about the current page and total number of pages, and finally any arbitrary context data passed as keyword-arguments.
The page is specified using the page GET argument, e.g. /my-object-list/?page=3 would return the third page of objects.
Example:
@app.route('/blog/') def post_index():
public_posts = (Post
.select()
.where(Post.published == True)
.order_by(Post.timestamp.desc()))
return object_list(
'post_index.html',
query=public_posts,
context_variable='post_list',
paginate_by=10)
The template will have the following context:
Helper class to perform pagination based on GET arguments.
If check_bounds was set to True and the requested page contains no objects, then a 404 will be raised.
These query examples are taken from the site PostgreSQL Exercises. A sample data-set can be found on the getting started page.
Here is a visual representation of the schema used in these examples: [image]
To begin working with the data, we'll define the model classes that correspond to the tables in the diagram.
NOTE:
from functools import partial from peewee import * db = PostgresqlDatabase('peewee_test') class BaseModel(Model):
class Meta:
database = db class Member(BaseModel):
memid = AutoField() # Auto-incrementing primary key.
surname = CharField()
firstname = CharField()
address = CharField(max_length=300)
zipcode = IntegerField()
telephone = CharField()
recommendedby = ForeignKeyField('self', backref='recommended',
column_name='recommendedby', null=True)
joindate = DateTimeField()
class Meta:
table_name = 'members' # Conveniently declare decimal fields suitable for storing currency. MoneyField = partial(DecimalField, decimal_places=2) class Facility(BaseModel):
facid = AutoField()
name = CharField()
membercost = MoneyField()
guestcost = MoneyField()
initialoutlay = MoneyField()
monthlymaintenance = MoneyField()
class Meta:
table_name = 'facilities' class Booking(BaseModel):
bookid = AutoField()
facility = ForeignKeyField(Facility, column_name='facid')
member = ForeignKeyField(Member, column_name='memid')
starttime = DateTimeField()
slots = IntegerField()
class Meta:
table_name = 'bookings'
If you downloaded the SQL file from the PostgreSQL Exercises site, then you can load the data into a PostgreSQL database using the following commands:
createdb peewee_test psql -U postgres -f clubdata.sql -d peewee_test -x -q
To create the schema using Peewee, without loading the sample data, you can run the following:
# Assumes you have created the database "peewee_test" already. db.create_tables([Member, Facility, Booking])
This category deals with the basics of SQL. It covers select and where clauses, case expressions, unions, and a few other odds and ends.
Retrieve all information from facilities table.
SELECT * FROM facilities
# By default, when no fields are explicitly passed to select(), all fields # will be selected. query = Facility.select()
Retrieve names of facilities and cost to members.
SELECT name, membercost FROM facilities;
query = Facility.select(Facility.name, Facility.membercost) # To iterate: for facility in query:
print(facility.name)
Retrieve list of facilities that have a cost to members.
SELECT * FROM facilities WHERE membercost > 0
query = Facility.select().where(Facility.membercost > 0)
Retrieve list of facilities that have a cost to members, and that fee is less than 1/50th of the monthly maintenance cost. Return id, name, cost and monthly-maintenance.
SELECT facid, name, membercost, monthlymaintenance FROM facilities WHERE membercost > 0 AND membercost < (monthlymaintenance / 50)
query = (Facility
.select(Facility.facid, Facility.name, Facility.membercost,
Facility.monthlymaintenance)
.where(
(Facility.membercost > 0) &
(Facility.membercost < (Facility.monthlymaintenance / 50))))
How can you produce a list of all facilities with the word 'Tennis' in their name?
SELECT * FROM facilities WHERE name ILIKE '%tennis%';
query = Facility.select().where(Facility.name.contains('tennis')) # OR use the exponent operator. Note: you must include wildcards here: query = Facility.select().where(Facility.name ** '%tennis%')
How can you retrieve the details of facilities with ID 1 and 5? Try to do it without using the OR operator.
SELECT * FROM facilities WHERE facid IN (1, 5);
query = Facility.select().where(Facility.facid.in_([1, 5])) # OR: query = Facility.select().where((Facility.facid == 1) |
(Facility.facid == 5))
How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive' depending on if their monthly maintenance cost is more than $100? Return the name and monthly maintenance of the facilities in question.
SELECT name, CASE WHEN monthlymaintenance > 100 THEN 'expensive' ELSE 'cheap' END FROM facilities;
cost = Case(None, [(Facility.monthlymaintenance > 100, 'expensive')], 'cheap') query = Facility.select(Facility.name, cost.alias('cost'))
NOTE:
How can you produce a list of members who joined after the start of September 2012? Return the memid, surname, firstname, and joindate of the members in question.
SELECT memid, surname, firstname, joindate FROM members WHERE joindate >= '2012-09-01';
query = (Member
.select(Member.memid, Member.surname, Member.firstname, Member.joindate)
.where(Member.joindate >= datetime.date(2012, 9, 1)))
How can you produce an ordered list of the first 10 surnames in the members table? The list must not contain duplicates.
SELECT DISTINCT surname FROM members ORDER BY surname LIMIT 10;
query = (Member
.select(Member.surname)
.order_by(Member.surname)
.limit(10)
.distinct())
You, for some reason, want a combined list of all surnames and all facility names.
SELECT surname FROM members UNION SELECT name FROM facilities;
lhs = Member.select(Member.surname) rhs = Facility.select(Facility.name) query = lhs | rhs
Queries can be composed using the following operators:
You'd like to get the signup date of your last member. How can you retrieve this information?
SELECT MAX(join_date) FROM members;
query = Member.select(fn.MAX(Member.joindate)) # To conveniently obtain a single scalar value, use "scalar()": # max_join_date = query.scalar()
You'd like to get the first and last name of the last member(s) who signed up - not just the date.
SELECT firstname, surname, joindate FROM members WHERE joindate = (SELECT MAX(joindate) FROM members);
# Use "alias()" to reference the same table multiple times in a query. MemberAlias = Member.alias() subq = MemberAlias.select(fn.MAX(MemberAlias.joindate)) query = (Member
.select(Member.firstname, Member.surname, Member.joindate)
.where(Member.joindate == subq))
This category deals primarily with a foundational concept in relational database systems: joining. Joining allows you to combine related information from multiple tables to answer a question. This isn't just beneficial for ease of querying: a lack of join capability encourages denormalisation of data, which increases the complexity of keeping your data internally consistent.
This topic covers inner, outer, and self joins, as well as spending a little time on subqueries (queries within queries).
How can you produce a list of the start times for bookings by members named 'David Farrell'?
SELECT starttime FROM bookings INNER JOIN members ON (bookings.memid = members.memid) WHERE surname = 'Farrell' AND firstname = 'David';
query = (Booking
.select(Booking.starttime)
.join(Member)
.where((Member.surname == 'Farrell') &
(Member.firstname == 'David')))
How can you produce a list of the start times for bookings for tennis courts, for the date '2012-09-21'? Return a list of start time and facility name pairings, ordered by the time.
SELECT starttime, name FROM bookings INNER JOIN facilities ON (bookings.facid = facilities.facid) WHERE date_trunc('day', starttime) = '2012-09-21':: date
AND name ILIKE 'tennis%' ORDER BY starttime, name;
query = (Booking
.select(Booking.starttime, Facility.name)
.join(Facility)
.where(
(fn.date_trunc('day', Booking.starttime) == datetime.date(2012, 9, 21)) &
Facility.name.startswith('Tennis'))
.order_by(Booking.starttime, Facility.name)) # To retrieve the joined facility's name when iterating: for booking in query:
print(booking.starttime, booking.facility.name)
How can you output a list of all members who have recommended another member? Ensure that there are no duplicates in the list, and that results are ordered by (surname, firstname).
SELECT DISTINCT m.firstname, m.surname FROM members AS m2 INNER JOIN members AS m ON (m.memid = m2.recommendedby) ORDER BY m.surname, m.firstname;
MA = Member.alias() query = (Member
.select(Member.firstname, Member.surname)
.join(MA, on=(MA.recommendedby == Member.memid))
.order_by(Member.surname, Member.firstname))
How can you output a list of all members, including the individual who recommended them (if any)? Ensure that results are ordered by (surname, firstname).
SELECT m.firstname, m.surname, r.firstname, r.surname FROM members AS m LEFT OUTER JOIN members AS r ON (m.recommendedby = r.memid) ORDER BY m.surname, m.firstname
MA = Member.alias() query = (Member
.select(Member.firstname, Member.surname, MA.firstname, MA.surname)
.join(MA, JOIN.LEFT_OUTER, on=(Member.recommendedby == MA.memid))
.order_by(Member.surname, Member.firstname)) # To display the recommender's name when iterating: for m in query:
print(m.firstname, m.surname)
if m.recommendedby:
print(' ', m.recommendedby.firstname, m.recommendedby.surname)
How can you produce a list of all members who have used a tennis court? Include in your output the name of the court, and the name of the member formatted as a single column. Ensure no duplicate data, and order by the member name.
SELECT DISTINCT m.firstname || ' ' || m.surname AS member, f.name AS facility FROM members AS m INNER JOIN bookings AS b ON (m.memid = b.memid) INNER JOIN facilities AS f ON (b.facid = f.facid) WHERE f.name LIKE 'Tennis%' ORDER BY member, facility;
fullname = Member.firstname + ' ' + Member.surname query = (Member
.select(fullname.alias('member'), Facility.name.alias('facility'))
.join(Booking)
.join(Facility)
.where(Facility.name.startswith('Tennis'))
.order_by(fullname, Facility.name)
.distinct())
How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? Remember that guests have different costs to members (the listed costs are per half-hour 'slot'), and the guest user is always ID 0. Include in your output the name of the facility, the name of the member formatted as a single column, and the cost. Order by descending cost, and do not use any subqueries.
SELECT m.firstname || ' ' || m.surname AS member,
f.name AS facility,
(CASE WHEN m.memid = 0 THEN f.guestcost * b.slots
ELSE f.membercost * b.slots END) AS cost FROM members AS m INNER JOIN bookings AS b ON (m.memid = b.memid) INNER JOIN facilities AS f ON (b.facid = f.facid) WHERE (date_trunc('day', b.starttime) = '2012-09-14') AND
((m.memid = 0 AND b.slots * f.guestcost > 30) OR
(m.memid > 0 AND b.slots * f.membercost > 30)) ORDER BY cost DESC;
cost = Case(Member.memid, (
(0, Booking.slots * Facility.guestcost), ), (Booking.slots * Facility.membercost)) fullname = Member.firstname + ' ' + Member.surname query = (Member
.select(fullname.alias('member'), Facility.name.alias('facility'),
cost.alias('cost'))
.join(Booking)
.join(Facility)
.where(
(fn.date_trunc('day', Booking.starttime) == datetime.date(2012, 9, 14)) &
(cost > 30))
.order_by(SQL('cost').desc())) # To iterate over the results, it might be easiest to use namedtuples: for row in query.namedtuples():
print(row.member, row.facility, row.cost)
How can you output a list of all members, including the individual who recommended them (if any), without using any joins? Ensure that there are no duplicates in the list, and that each firstname + surname pairing is formatted as a column and ordered.
SELECT DISTINCT m.firstname || ' ' || m.surname AS member,
(SELECT r.firstname || ' ' || r.surname
FROM cd.members AS r
WHERE m.recommendedby = r.memid) AS recommended FROM members AS m ORDER BY member;
MA = Member.alias() subq = (MA
.select(MA.firstname + ' ' + MA.surname)
.where(Member.recommendedby == MA.memid)) query = (Member
.select(fullname.alias('member'), subq.alias('recommended'))
.order_by(fullname))
The "Produce a list of costly bookings" exercise contained some messy logic: we had to calculate the booking cost in both the WHERE clause and the CASE statement. Try to simplify this calculation using subqueries.
SELECT member, facility, cost from (
SELECT
m.firstname || ' ' || m.surname as member,
f.name as facility,
CASE WHEN m.memid = 0 THEN b.slots * f.guestcost
ELSE b.slots * f.membercost END AS cost
FROM members AS m
INNER JOIN bookings AS b ON m.memid = b.memid
INNER JOIN facilities AS f ON b.facid = f.facid
WHERE date_trunc('day', b.starttime) = '2012-09-14' ) as bookings WHERE cost > 30 ORDER BY cost DESC;
cost = Case(Member.memid, (
(0, Booking.slots * Facility.guestcost), ), (Booking.slots * Facility.membercost)) iq = (Member
.select(fullname.alias('member'), Facility.name.alias('facility'),
cost.alias('cost'))
.join(Booking)
.join(Facility)
.where(fn.date_trunc('day', Booking.starttime) == datetime.date(2012, 9, 14))) query = (Member
.select(iq.c.member, iq.c.facility, iq.c.cost)
.from_(iq)
.where(iq.c.cost > 30)
.order_by(SQL('cost').desc())) # To iterate, try using dicts: for row in query.dicts():
print(row['member'], row['facility'], row['cost'])
Querying data is all well and good, but at some point you're probably going to want to put data into your database! This section deals with inserting, updating, and deleting information. Operations that alter your data like this are collectively known as Data Manipulation Language, or DML.
In previous sections, we returned to you the results of the query you've performed. Since modifications like the ones we're making in this section don't return any query results, we instead show you the updated content of the table you're supposed to be working on.
The club is adding a new facility - a spa. We need to add it into the facilities table. Use the following values: facid: 9, Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800
INSERT INTO "facilities" ("facid", "name", "membercost", "guestcost", "initialoutlay", "monthlymaintenance") VALUES (9, 'Spa', 20, 30, 100000, 800)
res = Facility.insert({
Facility.facid: 9,
Facility.name: 'Spa',
Facility.membercost: 20,
Facility.guestcost: 30,
Facility.initialoutlay: 100000,
Facility.monthlymaintenance: 800}).execute() # OR: res = (Facility
.insert(facid=9, name='Spa', membercost=20, guestcost=30,
initialoutlay=100000, monthlymaintenance=800)
.execute())
In the previous exercise, you learned how to add a facility. Now you're going to add multiple facilities in one command. Use the following values:
facid: 9, Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800.
facid: 10, Name: 'Squash Court 2', membercost: 3.5, guestcost: 17.5, initialoutlay: 5000, monthlymaintenance: 80.
-- see above --
data = [
{'facid': 9, 'name': 'Spa', 'membercost': 20, 'guestcost': 30,
'initialoutlay': 100000, 'monthlymaintenance': 800},
{'facid': 10, 'name': 'Squash Court 2', 'membercost': 3.5,
'guestcost': 17.5, 'initialoutlay': 5000, 'monthlymaintenance': 80}] res = Facility.insert_many(data).execute()
Let's try adding the spa to the facilities table again. This time, though, we want to automatically generate the value for the next facid, rather than specifying it as a constant. Use the following values for everything else: Name: 'Spa', membercost: 20, guestcost: 30, initialoutlay: 100000, monthlymaintenance: 800.
INSERT INTO "facilities" ("facid", "name", "membercost", "guestcost",
"initialoutlay", "monthlymaintenance") SELECT (SELECT (MAX("facid") + 1) FROM "facilities") AS _,
'Spa', 20, 30, 100000, 800;
maxq = Facility.select(fn.MAX(Facility.facid) + 1) subq = Select(columns=(maxq, 'Spa', 20, 30, 100000, 800)) res = Facility.insert_from(subq, Facility._meta.sorted_fields).execute()
We made a mistake when entering the data for the second tennis court. The initial outlay was 10000 rather than 8000: you need to alter the data to fix the error.
UPDATE facilities SET initialoutlay = 10000 WHERE name = 'Tennis Court 2';
res = (Facility
.update({Facility.initialoutlay: 10000})
.where(Facility.name == 'Tennis Court 2')
.execute()) # OR: res = (Facility
.update(initialoutlay=10000)
.where(Facility.name == 'Tennis Court 2')
.execute())
We want to increase the price of the tennis courts for both members and guests. Update the costs to be 6 for members, and 30 for guests.
UPDATE facilities SET membercost=6, guestcost=30 WHERE name ILIKE 'Tennis%';
nrows = (Facility
.update(membercost=6, guestcost=30)
.where(Facility.name.startswith('Tennis'))
.execute())
We want to alter the price of the second tennis court so that it costs 10% more than the first one. Try to do this without using constant values for the prices, so that we can reuse the statement if we want to.
UPDATE facilities SET membercost = (SELECT membercost * 1.1 FROM facilities WHERE facid = 0), guestcost = (SELECT guestcost * 1.1 FROM facilities WHERE facid = 0) WHERE facid = 1; -- OR -- WITH new_prices (nmc, ngc) AS (
SELECT membercost * 1.1, guestcost * 1.1
FROM facilities WHERE name = 'Tennis Court 1') UPDATE facilities SET membercost = new_prices.nmc, guestcost = new_prices.ngc FROM new_prices WHERE name = 'Tennis Court 2'
sq1 = Facility.select(Facility.membercost * 1.1).where(Facility.facid == 0) sq2 = Facility.select(Facility.guestcost * 1.1).where(Facility.facid == 0) res = (Facility
.update(membercost=sq1, guestcost=sq2)
.where(Facility.facid == 1)
.execute()) # OR: cte = (Facility
.select(Facility.membercost * 1.1, Facility.guestcost * 1.1)
.where(Facility.name == 'Tennis Court 1')
.cte('new_prices', columns=('nmc', 'ngc'))) res = (Facility
.update(membercost=SQL('new_prices.nmc'), guestcost=SQL('new_prices.ngc'))
.with_cte(cte)
.from_(cte)
.where(Facility.name == 'Tennis Court 2')
.execute())
As part of a clearout of our database, we want to delete all bookings from the bookings table.
DELETE FROM bookings;
nrows = Booking.delete().execute()
We want to remove member 37, who has never made a booking, from our database.
DELETE FROM members WHERE memid = 37;
nrows = Member.delete().where(Member.memid == 37).execute()
How can we make that more general, to delete all members who have never made a booking?
DELETE FROM members WHERE NOT EXISTS (
SELECT * FROM bookings WHERE bookings.memid = members.memid);
subq = Booking.select().where(Booking.member == Member.memid) nrows = Member.delete().where(~fn.EXISTS(subq)).execute()
Aggregation is one of those capabilities that really make you appreciate the power of relational database systems. It allows you to move beyond merely persisting your data, into the realm of asking truly interesting questions that can be used to inform decision making. This category covers aggregation at length, making use of standard grouping as well as more recent window functions.
For our first foray into aggregates, we're going to stick to something simple. We want to know how many facilities exist - simply produce a total count.
SELECT COUNT(facid) FROM facilities;
query = Facility.select(fn.COUNT(Facility.facid)) count = query.scalar() # OR: count = Facility.select().count()
Produce a count of the number of facilities that have a cost to guests of 10 or more.
SELECT COUNT(facid) FROM facilities WHERE guestcost >= 10
query = Facility.select(fn.COUNT(Facility.facid)).where(Facility.guestcost >= 10) count = query.scalar() # OR: # count = Facility.select().where(Facility.guestcost >= 10).count()
Produce a count of the number of recommendations each member has made. Order by member ID.
SELECT recommendedby, COUNT(memid) FROM members WHERE recommendedby IS NOT NULL GROUP BY recommendedby ORDER BY recommendedby
query = (Member
.select(Member.recommendedby, fn.COUNT(Member.memid))
.where(Member.recommendedby.is_null(False))
.group_by(Member.recommendedby)
.order_by(Member.recommendedby))
Produce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id.
SELECT facid, SUM(slots) FROM bookings GROUP BY facid ORDER BY facid;
query = (Booking
.select(Booking.facid, fn.SUM(Booking.slots))
.group_by(Booking.facid)
.order_by(Booking.facid))
Produce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots.
SELECT facid, SUM(slots) FROM bookings WHERE (date_trunc('month', starttime) = '2012-09-01'::dates) GROUP BY facid ORDER BY SUM(slots)
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.where(fn.date_trunc('month', Booking.starttime) == datetime.date(2012, 9, 1))
.group_by(Booking.facility)
.order_by(fn.SUM(Booking.slots)))
Produce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month.
SELECT facid, date_part('month', starttime), SUM(slots) FROM bookings WHERE date_part('year', starttime) = 2012 GROUP BY facid, date_part('month', starttime) ORDER BY facid, date_part('month', starttime)
month = fn.date_part('month', Booking.starttime) query = (Booking
.select(Booking.facility, month, fn.SUM(Booking.slots))
.where(fn.date_part('year', Booking.starttime) == 2012)
.group_by(Booking.facility, month)
.order_by(Booking.facility, month))
Find the total number of members who have made at least one booking.
SELECT COUNT(DISTINCT memid) FROM bookings -- OR -- SELECT COUNT(1) FROM (SELECT DISTINCT memid FROM bookings) AS _
query = Booking.select(fn.COUNT(Booking.member.distinct())) # OR: query = Booking.select(Booking.member).distinct() count = query.count() # count() wraps in SELECT COUNT(1) FROM (...)
Produce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and hours, sorted by facility id.
SELECT facid, SUM(slots) FROM bookings GROUP BY facid HAVING SUM(slots) > 1000 ORDER BY facid;
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.group_by(Booking.facility)
.having(fn.SUM(Booking.slots) > 1000)
.order_by(Booking.facility))
Produce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members!
SELECT f.name, SUM(b.slots * ( CASE WHEN b.memid = 0 THEN f.guestcost ELSE f.membercost END)) AS revenue FROM bookings AS b INNER JOIN facilities AS f ON b.facid = f.facid GROUP BY f.name ORDER BY revenue;
revenue = fn.SUM(Booking.slots * Case(None, (
(Booking.member == 0, Facility.guestcost), ), Facility.membercost)) query = (Facility
.select(Facility.name, revenue.alias('revenue'))
.join(Booking)
.group_by(Facility.name)
.order_by(SQL('revenue')))
Produce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members!
SELECT f.name, SUM(b.slots * ( CASE WHEN b.memid = 0 THEN f.guestcost ELSE f.membercost END)) AS revenue FROM bookings AS b INNER JOIN facilities AS f ON b.facid = f.facid GROUP BY f.name HAVING SUM(b.slots * ...) < 1000 ORDER BY revenue;
# Same definition as previous example. revenue = fn.SUM(Booking.slots * Case(None, (
(Booking.member == 0, Facility.guestcost), ), Facility.membercost)) query = (Facility
.select(Facility.name, revenue.alias('revenue'))
.join(Booking)
.group_by(Facility.name)
.having(revenue < 1000)
.order_by(SQL('revenue')))
Output the facility id that has the highest number of slots booked.
SELECT facid, SUM(slots) FROM bookings GROUP BY facid ORDER BY SUM(slots) DESC LIMIT 1
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.group_by(Booking.facility)
.order_by(fn.SUM(Booking.slots).desc())
.limit(1)) # Retrieve multiple scalar values by calling scalar() with as_tuple=True. facid, nslots = query.scalar(as_tuple=True)
Produce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns.
Postgres ONLY.
SELECT facid, date_part('month', starttime), SUM(slots) FROM booking WHERE date_part('year', starttime) = 2012 GROUP BY ROLLUP(facid, date_part('month', starttime)) ORDER BY facid, date_part('month', starttime)
month = fn.date_part('month', Booking.starttime) query = (Booking
.select(Booking.facility,
month.alias('month'),
fn.SUM(Booking.slots))
.where(fn.date_part('year', Booking.starttime) == 2012)
.group_by(fn.ROLLUP(Booking.facility, month))
.order_by(Booking.facility, month))
Produce a list of the total number of hours booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id.
SELECT f.facid, f.name, SUM(b.slots) * .5 FROM facilities AS f INNER JOIN bookings AS b ON (f.facid = b.facid) GROUP BY f.facid, f.name ORDER BY f.facid
query = (Facility
.select(Facility.facid, Facility.name, fn.SUM(Booking.slots) * .5)
.join(Booking)
.group_by(Facility.facid, Facility.name)
.order_by(Facility.facid))
Produce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID.
SELECT m.surname, m.firstname, m.memid, min(b.starttime) as starttime FROM members AS m INNER JOIN bookings AS b ON b.memid = m.memid WHERE starttime >= '2012-09-01' GROUP BY m.surname, m.firstname, m.memid ORDER BY m.memid;
query = (Member
.select(Member.surname, Member.firstname, Member.memid,
fn.MIN(Booking.starttime).alias('starttime'))
.join(Booking)
.where(Booking.starttime >= datetime.date(2012, 9, 1))
.group_by(Member.surname, Member.firstname, Member.memid)
.order_by(Member.memid))
Produce a list of member names, with each row containing the total member count. Order by join date.
Postgres ONLY (as written).
SELECT COUNT(*) OVER(), firstname, surname FROM members ORDER BY joindate
query = (Member
.select(fn.COUNT(Member.memid).over(), Member.firstname,
Member.surname)
.order_by(Member.joindate))
Produce a monotonically increasing numbered list of members, ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential.
Postgres ONLY (as written).
SELECT row_number() OVER (ORDER BY joindate), firstname, surname FROM members ORDER BY joindate;
query = (Member
.select(fn.row_number().over(order_by=[Member.joindate]),
Member.firstname, Member.surname)
.order_by(Member.joindate))
Output the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output.
Postgres ONLY (as written).
SELECT facid, total FROM (
SELECT facid, SUM(slots) AS total,
rank() OVER (order by SUM(slots) DESC) AS rank
FROM bookings
GROUP BY facid ) AS ranked WHERE rank = 1
rank = fn.rank().over(order_by=[fn.SUM(Booking.slots).desc()]) subq = (Booking
.select(Booking.facility, fn.SUM(Booking.slots).alias('total'),
rank.alias('rank'))
.group_by(Booking.facility)) # Here we use a plain Select() to create our query. query = (Select(columns=[subq.c.facid, subq.c.total])
.from_(subq)
.where(subq.c.rank == 1)
.bind(db)) # We must bind() it to the database. # To iterate over the query results: for facid, total in query.tuples():
print(facid, total)
Produce a list of members, along with the number of hours they've booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name.
Postgres ONLY (as written).
SELECT firstname, surname, ((SUM(bks.slots)+10)/20)*10 as hours, rank() over (order by ((sum(bks.slots)+10)/20)*10 desc) as rank FROM members AS mems INNER JOIN bookings AS bks ON mems.memid = bks.memid GROUP BY mems.memid ORDER BY rank, surname, firstname;
hours = ((fn.SUM(Booking.slots) + 10) / 20) * 10 query = (Member
.select(Member.firstname, Member.surname, hours.alias('hours'),
fn.rank().over(order_by=[hours.desc()]).alias('rank'))
.join(Booking)
.group_by(Member.memid)
.order_by(SQL('rank'), Member.surname, Member.firstname))
Produce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name.
Postgres ONLY (as written).
SELECT name, rank FROM (
SELECT f.name, RANK() OVER (ORDER BY SUM(
CASE WHEN memid = 0 THEN slots * f.guestcost
ELSE slots * f.membercost END) DESC) AS rank
FROM bookings
INNER JOIN facilities AS f ON bookings.facid = f.facid
GROUP BY f.name) AS subq WHERE rank <= 3 ORDER BY rank;
total_cost = fn.SUM(Case(None, (
(Booking.member == 0, Booking.slots * Facility.guestcost), ), (Booking.slots * Facility.membercost))) subq = (Facility
.select(Facility.name,
fn.RANK().over(order_by=[total_cost.desc()]).alias('rank'))
.join(Booking)
.group_by(Facility.name)) query = (Select(columns=[subq.c.name, subq.c.rank])
.from_(subq)
.where(subq.c.rank <= 3)
.order_by(subq.c.rank)
.bind(db)) # Here again we used plain Select, and call bind().
Classify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name.
Postgres ONLY (as written).
SELECT name,
CASE class WHEN 1 THEN 'high' WHEN 2 THEN 'average' ELSE 'low' END FROM (
SELECT f.name, ntile(3) OVER (ORDER BY SUM(
CASE WHEN memid = 0 THEN slots * f.guestcost ELSE slots * f.membercost
END) DESC) AS class
FROM bookings INNER JOIN facilities AS f ON bookings.facid = f.facid
GROUP BY f.name ) AS subq ORDER BY class, name;
cost = fn.SUM(Case(None, (
(Booking.member == 0, Booking.slots * Facility.guestcost), ), (Booking.slots * Facility.membercost))) subq = (Facility
.select(Facility.name,
fn.NTILE(3).over(order_by=[cost.desc()]).alias('klass'))
.join(Booking)
.group_by(Facility.name)) klass_case = Case(subq.c.klass, [(1, 'high'), (2, 'average')], 'low') query = (Select(columns=[subq.c.name, klass_case])
.from_(subq)
.order_by(subq.c.klass, subq.c.name)
.bind(db))
Common Table Expressions allow us to, effectively, create our own temporary tables for the duration of a query - they're largely a convenience to help us make more readable SQL. Using the WITH RECURSIVE modifier, however, it's possible for us to create recursive queries. This is enormously advantageous for working with tree and graph-structured data - imagine retrieving all of the relations of a graph node to a given depth, for example.
Find the upward recommendation chain for member ID 27: that is, the member who recommended them, and the member who recommended that member, and so on. Return member ID, first name, and surname. Order by descending member id.
WITH RECURSIVE recommenders(recommender) as (
SELECT recommendedby FROM members WHERE memid = 27
UNION ALL
SELECT mems.recommendedby
FROM recommenders recs
INNER JOIN members AS mems ON mems.memid = recs.recommender ) SELECT recs.recommender, mems.firstname, mems.surname FROM recommenders AS recs INNER JOIN members AS mems ON recs.recommender = mems.memid ORDER By memid DESC;
# Base-case of recursive CTE. Get member recommender where memid=27. base = (Member
.select(Member.recommendedby)
.where(Member.memid == 27)
.cte('recommenders', recursive=True, columns=('recommender',))) # Recursive term of CTE. Get recommender of previous recommender. MA = Member.alias() recursive = (MA
.select(MA.recommendedby)
.join(base, on=(MA.memid == base.c.recommender))) # Combine the base-case with the recursive term. cte = base.union_all(recursive) # Select from the recursive CTE, joining on member to get name info. query = (cte
.select_from(cte.c.recommender, Member.firstname, Member.surname)
.join(Member, on=(cte.c.recommender == Member.memid))
.order_by(Member.memid.desc()))
Peewee's high-level Model and Field APIs are built upon lower-level Table and Column counterparts. While these lower-level APIs are not documented in as much detail as their high-level counterparts, this document will present an overview with examples that should hopefully allow you to experiment.
We'll use the following schema:
CREATE TABLE "person" (
"id" INTEGER NOT NULL PRIMARY KEY,
"first" TEXT NOT NULL,
"last" TEXT NOT NULL); CREATE TABLE "note" (
"id" INTEGER NOT NULL PRIMARY KEY,
"person_id" INTEGER NOT NULL,
"content" TEXT NOT NULL,
"timestamp" DATETIME NOT NULL,
FOREIGN KEY ("person_id") REFERENCES "person" ("id")); CREATE TABLE "reminder" (
"id" INTEGER NOT NULL PRIMARY KEY,
"note_id" INTEGER NOT NULL,
"alarm" DATETIME NOT NULL,
FOREIGN KEY ("note_id") REFERENCES "note" ("id"));
There are two ways we can declare Table objects for working with these tables:
# Explicitly declare columns Person = Table('person', ('id', 'first', 'last')) Note = Table('note', ('id', 'person_id', 'content', 'timestamp')) # Do not declare columns, they will be accessed using magic ".c" attribute Reminder = Table('reminder')
Typically we will want to bind() our tables to a database. This saves us having to pass the database explicitly every time we wish to execute a query on the table:
db = SqliteDatabase('my_app.db') Person = Person.bind(db) Note = Note.bind(db) Reminder = Reminder.bind(db)
To select the first three notes and print their content, we can write:
query = Note.select().order_by(Note.timestamp).limit(3) for note_dict in query:
print(note_dict['content'])
NOTE:
Because we didn't specify any columns, all the columns we defined in the note's Table constructor will be selected. This won't work for Reminder, as we didn't specify any columns at all.
To select all notes published in 2018 along with the name of the creator, we will use join(). We'll also request that rows be returned as namedtuple objects:
query = (Note
.select(Note.content, Note.timestamp, Person.first, Person.last)
.join(Person, on=(Note.person_id == Person.id))
.where(Note.timestamp >= datetime.date(2018, 1, 1))
.order_by(Note.timestamp)
.namedtuples()) for row in query:
print(row.timestamp, '-', row.content, '-', row.first, row.last)
Let's query for the most prolific people, that is, get the people who have created the most notes. This introduces calling a SQL function (COUNT), which is accomplished using the fn object:
name = Person.first.concat(' ').concat(Person.last) query = (Person
.select(name.alias('name'), fn.COUNT(Note.id).alias('count'))
.join(Note, JOIN.LEFT_OUTER, on=(Note.person_id == Person.id))
.group_by(name)
.order_by(fn.COUNT(Note.id).desc())) for row in query:
print(row['name'], row['count'])
There are a couple things to note in the above query:
As a more complex example, we'll generate a list of all people and the contents and timestamp of their most recently-published note. To do this, we will end up using the Note table twice in different contexts within the same query, which will require us to use a table alias.
# Start with the query that calculates the timestamp of the most recent # note for each person. NA = Note.alias('na') max_note = (NA
.select(NA.person_id, fn.MAX(NA.timestamp).alias('max_ts'))
.group_by(NA.person_id)
.alias('max_note')) # Now we'll select from the note table, joining on both the subquery and # on the person table to construct the result set. query = (Note
.select(Note.content, Note.timestamp, Person.first, Person.last)
.join(max_note, on=((max_note.c.person_id == Note.person_id) &
(max_note.c.max_ts == Note.timestamp)))
.join(Person, on=(Note.person_id == Person.id))
.order_by(Person.first, Person.last)) for row in query.namedtuples():
print(row.first, row.last, ':', row.timestamp, '-', row.content)
In the join predicate for the join on the max_note subquery, we can reference columns in the subquery using the magical ".c" attribute. So, max_note.c.max_ts is translated into "the max_ts column value from the max_note subquery".
We can also use the ".c" magic attribute to access columns on tables that do not explicitly define their columns, like we did with the Reminder table. Here's a simple query to get all reminders for today, along with their associated note content:
today = datetime.date.today() tomorrow = today + datetime.timedelta(days=1) query = (Reminder
.select(Reminder.c.alarm, Note.content)
.join(Note, on=(Reminder.c.note_id == Note.id))
.where(Reminder.c.alarm.between(today, tomorrow))
.order_by(Reminder.c.alarm)) for row in query:
print(row['alarm'], row['content'])
NOTE:
Inserting data is straightforward. We can specify data to insert() in two different ways (in both cases, the ID of the new row is returned):
# Using keyword arguments: zaizee_id = Person.insert(first='zaizee', last='cat').execute() # Using column: value mappings: Note.insert({
Note.person_id: zaizee_id,
Note.content: 'meeeeowwww',
Note.timestamp: datetime.datetime.now()}).execute()
It is easy to bulk-insert data, just pass in either:
Examples:
people = [
{'first': 'Bob', 'last': 'Foo'},
{'first': 'Herb', 'last': 'Bar'},
{'first': 'Nuggie', 'last': 'Bar'}] # Inserting multiple rows returns the ID of the last-inserted row. last_id = Person.insert(people).execute() # We can also specify row tuples, so long as we tell Peewee which # columns the tuple values correspond to: people = [
('Bob', 'Foo'),
('Herb', 'Bar'),
('Nuggie', 'Bar')] Person.insert(people, columns=[Person.first, Person.last]).execute()
update() queries accept either keyword arguments or a dictionary mapping column to value, just like insert().
Examples:
# "Bob" changed his last name from "Foo" to "Baze". nrows = (Person
.update(last='Baze')
.where((Person.first == 'Bob') &
(Person.last == 'Foo'))
.execute()) # Use dictionary mapping column to value. nrows = (Person
.update({Person.last: 'Baze'})
.where((Person.first == 'Bob') &
(Person.last == 'Foo'))
.execute())
You can also use expressions as the value to perform an atomic update. Imagine we have a PageView table and we need to atomically increment the page-view count for some URL:
# Do an atomic update: (PageView
.update({PageView.count: PageView.count + 1})
.where(PageView.url == some_url)
.execute())
delete() queries are simplest of all, as they do not accept any arguments:
# Delete all notes created before 2018, returning number deleted. n = Note.delete().where(Note.timestamp < datetime.date(2018, 1, 1)).execute()
Because DELETE (and UPDATE) queries do not support joins, we can use subqueries to delete rows based on values in related tables. For example, here is how you would delete all notes by anyone whose last name is "Foo":
# Get the id of all people whose last name is "Foo". foo_people = Person.select(Person.id).where(Person.last == 'Foo') # Delete all notes by any person whose ID is in the previous query. Note.delete().where(Note.person_id.in_(foo_people)).execute()
One of the fundamental limitations of the abstractions provided by Peewee 2.x was the absence of a class that represented a structured query with no relation to a given model class.
An example of this might be computing aggregate values over a subquery. For example, the count() method, which returns the count of rows in an arbitrary query, is implemented by wrapping the query:
SELECT COUNT(1) FROM (...)
To accomplish this with Peewee, the implementation is written in this way:
def count(query):
# Select([source1, ... sourcen], [column1, ...columnn])
wrapped = Select(from_list=[query], columns=[fn.COUNT(SQL('1'))])
curs = wrapped.tuples().execute(db)
return curs[0][0] # Return first column from first row of result.
We can actually express this more concisely using the scalar() method, which is suitable for returning values from aggregate queries:
def count(query):
wrapped = Select(from_list=[query], columns=[fn.COUNT(SQL('1'))])
return wrapped.scalar(db)
The query_examples document has a more complex example, in which we write a query for a facility with the highest number of available slots booked:
The SQL we wish to express is:
SELECT facid, total FROM (
SELECT facid, SUM(slots) AS total,
rank() OVER (order by SUM(slots) DESC) AS rank
FROM bookings
GROUP BY facid ) AS ranked WHERE rank = 1
We can express this fairly elegantly by using a plain Select for the outer query:
# Store rank expression in variable for readability. rank_expr = fn.rank().over(order_by=[fn.SUM(Booking.slots).desc()]) subq = (Booking
.select(Booking.facility, fn.SUM(Booking.slots).alias('total'),
rank_expr.alias('rank'))
.group_by(Booking.facility)) # Use a plain "Select" to create outer query. query = (Select(columns=[subq.c.facid, subq.c.total])
.from_(subq)
.where(subq.c.rank == 1)
.tuples()) # Iterate over the resulting facility ID(s) and total(s): for facid, total in query.execute(db):
print(facid, total)
For another example, let's create a recursive common table expression to calculate the first 10 fibonacci numbers:
base = Select(columns=(
Value(1).alias('n'),
Value(0).alias('fib_n'),
Value(1).alias('next_fib_n'))).cte('fibonacci', recursive=True) n = (base.c.n + 1).alias('n') recursive_term = Select(columns=(
n,
base.c.next_fib_n,
base.c.fib_n + base.c.next_fib_n)).from_(base).where(n < 10) fibonacci = base.union_all(recursive_term) query = fibonacci.select_from(fibonacci.c.n, fibonacci.c.fib_n) results = list(query.execute(db)) # Generates the following result list: [{'fib_n': 0, 'n': 1},
{'fib_n': 1, 'n': 2},
{'fib_n': 1, 'n': 3},
{'fib_n': 2, 'n': 4},
{'fib_n': 3, 'n': 5},
{'fib_n': 5, 'n': 6},
{'fib_n': 8, 'n': 7},
{'fib_n': 13, 'n': 8},
{'fib_n': 21, 'n': 9},
{'fib_n': 34, 'n': 10}]
For a description of the various classes used to describe a SQL AST, see the query builder API documentation.
If you're interested in learning more, you can also check out the project source code.
Collected hacks using peewee. Have a cool hack you'd like to share? Open an issue on GitHub or contact me.
Optimistic locking is useful in situations where you might ordinarily use a SELECT FOR UPDATE (or in SQLite, BEGIN IMMEDIATE). For example, you might fetch a user record from the database, make some modifications, then save the modified user record. Typically this scenario would require us to lock the user record for the duration of the transaction, from the moment we select it, to the moment we save our changes.
In optimistic locking, on the other hand, we do not acquire any lock and instead rely on an internal version column in the row we're modifying. At read time, we see what version the row is currently at, and on save, we ensure that the update takes place only if the version is the same as the one we initially read. If the version is higher, then some other process must have snuck in and changed the row -- to save our modified version could result in the loss of important changes.
It's quite simple to implement optimistic locking in Peewee, here is a base class that you can use as a starting point:
from peewee import * class ConflictDetectedException(Exception): pass class BaseVersionedModel(Model):
version = IntegerField(default=1, index=True)
def save_optimistic(self):
if not self.id:
# This is a new record, so the default logic is to perform an
# INSERT. Ideally your model would also have a unique
# constraint that made it impossible for two INSERTs to happen
# at the same time.
return self.save()
# Update any data that has changed and bump the version counter.
field_data = dict(self.__data__)
current_version = field_data.pop('version', 1)
self._populate_unsaved_relations(field_data)
field_data = self._prune_fields(field_data, self.dirty_fields)
if not field_data:
raise ValueError('No changes have been made.')
ModelClass = type(self)
field_data['version'] = ModelClass.version + 1 # Atomic increment.
query = ModelClass.update(**field_data).where(
(ModelClass.version == current_version) &
(ModelClass.id == self.id))
if query.execute() == 0:
# No rows were updated, indicating another process has saved
# a new version. How you handle this situation is up to you,
# but for simplicity I'm just raising an exception.
raise ConflictDetectedException()
else:
# Increment local version to match what is now in the db.
self.version += 1
return True
Here's an example of how this works. Let's assume we have the following model definition. Note that there's a unique constraint on the username -- this is important as it provides a way to prevent double-inserts.
class User(BaseVersionedModel):
username = CharField(unique=True)
favorite_animal = CharField()
Example:
>>> u = User(username='charlie', favorite_animal='cat') >>> u.save_optimistic() True >>> u.version 1 >>> u.save_optimistic() Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "x.py", line 18, in save_optimistic
raise ValueError('No changes have been made.') ValueError: No changes have been made. >>> u.favorite_animal = 'kitten' >>> u.save_optimistic() True # Simulate a separate thread coming in and updating the model. >>> u2 = User.get(User.username == 'charlie') >>> u2.favorite_animal = 'macaw' >>> u2.save_optimistic() True # Now, attempt to change and re-save the original instance: >>> u.favorite_animal = 'little parrot' >>> u.save_optimistic() Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "x.py", line 30, in save_optimistic
raise ConflictDetectedException() ConflictDetectedException: current version is out of sync
These examples describe several ways to query the single top item per group. For a thorough discuss of various techniques, check out my blog post Querying the top item by group with Peewee ORM. If you are interested in the more general problem of querying the top N items, see the section below Top N objects per group.
In these examples we will use the User and Tweet models to find each user and their most-recent tweet.
The most efficient method I found in my testing uses the MAX() aggregate function.
We will perform the aggregation in a non-correlated subquery, so we can be confident this method will be performant. The idea is that we will select the posts, grouped by their author, whose timestamp is equal to the max observed timestamp for that user.
# When referencing a table multiple times, we'll call Model.alias() to create # a secondary reference to the table. TweetAlias = Tweet.alias() # Create a subquery that will calculate the maximum Tweet created_date for each # user. subquery = (TweetAlias
.select(
TweetAlias.user,
fn.MAX(TweetAlias.created_date).alias('max_ts'))
.group_by(TweetAlias.user)
.alias('tweet_max_subquery')) # Query for tweets and join using the subquery to match the tweet's user # and created_date. query = (Tweet
.select(Tweet, User)
.join(User)
.switch(Tweet)
.join(subquery, on=(
(Tweet.created_date == subquery.c.max_ts) &
(Tweet.user == subquery.c.user_id))))
SQLite and MySQL are a bit more lax and permit grouping by a subset of the columns that are selected. This means we can do away with the subquery and express it quite concisely:
query = (Tweet
.select(Tweet, User)
.join(User)
.group_by(Tweet.user)
.having(Tweet.created_date == fn.MAX(Tweet.created_date)))
These examples describe several ways to query the top N items per group reasonably efficiently. For a thorough discussion of various techniques, check out my blog post Querying the top N objects per group with Peewee ORM.
In these examples we will use the User and Tweet models to find each user and their three most-recent tweets.
Lateral joins are a neat Postgres feature that allow reasonably efficient correlated subqueries. They are often described as SQL for each loops.
The desired SQL is:
SELECT * FROM
(SELECT id, username FROM user) AS uq
LEFT JOIN LATERAL
(SELECT message, created_date
FROM tweet
WHERE (user_id = uq.id)
ORDER BY created_date DESC LIMIT 3)
AS pq ON true
To accomplish this with peewee is quite straightforward:
subq = (Tweet
.select(Tweet.message, Tweet.created_date)
.where(Tweet.user == User.id)
.order_by(Tweet.created_date.desc())
.limit(3)) query = (User
.select(User, subq.c.content, subq.c.created_date)
.join(subq, JOIN.LEFT_LATERAL)
.order_by(User.username, subq.c.created_date.desc())) # We queried from the "perspective" of user, so the rows are User instances # with the addition of a "content" and "created_date" attribute for each of # the (up-to) 3 most-recent tweets for each user. for row in query:
print(row.username, row.content, row.created_date)
To implement an equivalent query from the "perspective" of the Tweet model, we can instead write:
# subq is the same as the above example. subq = (Tweet
.select(Tweet.message, Tweet.created_date)
.where(Tweet.user == User.id)
.order_by(Tweet.created_date.desc())
.limit(3)) query = (Tweet
.select(User.username, subq.c.content, subq.c.created_date)
.from_(User)
.join(subq, JOIN.LEFT_LATERAL)
.order_by(User.username, subq.c.created_date.desc())) # Each row is a "tweet" instance with an additional "username" attribute. # This will print the (up-to) 3 most-recent tweets from each user. for tweet in query:
print(tweet.username, tweet.content, tweet.created_date)
Window functions, which are supported by peewee, provide scalable, efficient performance.
The desired SQL is:
SELECT subq.message, subq.username FROM (
SELECT
t2.message,
t3.username,
RANK() OVER (
PARTITION BY t2.user_id
ORDER BY t2.created_date DESC
) AS rnk
FROM tweet AS t2
INNER JOIN user AS t3 ON (t2.user_id = t3.id) ) AS subq WHERE (subq.rnk <= 3)
To accomplish this with peewee, we will wrap the ranked Tweets in an outer query that performs the filtering.
TweetAlias = Tweet.alias() # The subquery will select the relevant data from the Tweet and # User table, as well as ranking the tweets by user from newest # to oldest. subquery = (TweetAlias
.select(
TweetAlias.message,
User.username,
fn.RANK().over(
partition_by=[TweetAlias.user],
order_by=[TweetAlias.created_date.desc()]).alias('rnk'))
.join(User, on=(TweetAlias.user == User.id))
.alias('subq')) # Since we can't filter on the rank, we are wrapping it in a query # and performing the filtering in the outer query. query = (Tweet
.select(subquery.c.message, subquery.c.username)
.from_(subquery)
.where(subquery.c.rnk <= 3))
If you're not using Postgres, then unfortunately you're left with options that exhibit less-than-ideal performance. For a more complete overview of common methods, check out this blog post. Below I will summarize the approaches and the corresponding SQL.
Using COUNT, we can get all tweets where there exist less than N tweets with more recent timestamps:
TweetAlias = Tweet.alias() # Create a correlated subquery that calculates the number of # tweets with a higher (newer) timestamp than the tweet we're # looking at in the outer query. subquery = (TweetAlias
.select(fn.COUNT(TweetAlias.id))
.where(
(TweetAlias.created_date >= Tweet.created_date) &
(TweetAlias.user == Tweet.user))) # Wrap the subquery and filter on the count. query = (Tweet
.select(Tweet, User)
.join(User)
.where(subquery <= 3))
We can achieve similar results by doing a self-join and performing the filtering in the HAVING clause:
TweetAlias = Tweet.alias() # Use a self-join and join predicates to count the number of # newer tweets. query = (Tweet
.select(Tweet.id, Tweet.message, Tweet.user, User.username)
.join(User)
.switch(Tweet)
.join(TweetAlias, on=(
(TweetAlias.user == Tweet.user) &
(TweetAlias.created_date >= Tweet.created_date)))
.group_by(Tweet.id, Tweet.content, Tweet.user, User.username)
.having(fn.COUNT(Tweet.id) <= 3))
The last example uses a LIMIT clause in a correlated subquery.
TweetAlias = Tweet.alias() # The subquery here will calculate, for the user who created the # tweet in the outer loop, the three newest tweets. The expression # will evaluate to `True` if the outer-loop tweet is in the set of # tweets represented by the inner query. query = (Tweet
.select(Tweet, User)
.join(User)
.where(Tweet.id << (
TweetAlias
.select(TweetAlias.id)
.where(TweetAlias.user == Tweet.user)
.order_by(TweetAlias.created_date.desc())
.limit(3))))
SQLite is very easy to extend with custom functions written in Python, that are then callable from your SQL statements. By using the SqliteExtDatabase and the func() decorator, you can very easily define your own functions.
Here is an example function that generates a hashed version of a user-supplied password. We can also use this to implement login functionality for matching a user and password.
from hashlib import sha1 from random import random from playhouse.sqlite_ext import SqliteExtDatabase db = SqliteExtDatabase('my-blog.db') def get_hexdigest(salt, raw_password):
data = salt + raw_password
return sha1(data.encode('utf8')).hexdigest() @db.func() def make_password(raw_password):
salt = get_hexdigest(str(random()), str(random()))[:5]
hsh = get_hexdigest(salt, raw_password)
return '%s$%s' % (salt, hsh) @db.func() def check_password(raw_password, enc_password):
salt, hsh = enc_password.split('$', 1)
return hsh == get_hexdigest(salt, raw_password)
Here is how you can use the function to add a new user, storing a hashed password:
query = User.insert(
username='charlie',
password=fn.make_password('testing')).execute()
If we retrieve the user from the database, the password that's stored is hashed and salted:
>>> user = User.get(User.username == 'charlie') >>> print user.password b76fa$88be1adcde66a1ac16054bc17c8a297523170949
To implement login-type functionality, you could write something like this:
def login(username, password):
try:
return (User
.select()
.where(
(User.username == username) &
(fn.check_password(password, User.password) == True))
.get())
except User.DoesNotExist:
# Incorrect username and/or password.
return False
Each of the databases supported by Peewee implement their own set of functions and semantics for date/time arithmetic.
This section will provide a short scenario and example code demonstrating how you might utilize Peewee to do dynamic date manipulation in SQL.
Scenario: we need to run certain tasks every X seconds, and both the task intervals and the task themselves are defined in the database. We need to write some code that will tell us which tasks we should run at a given time:
class Schedule(Model):
interval = IntegerField() # Run this schedule every X seconds. class Task(Model):
schedule = ForeignKeyField(Schedule, backref='tasks')
command = TextField() # Run this command.
last_run = DateTimeField() # When was this run last?
Our logic will essentially boil down to:
# e.g., if the task was last run at 12:00:05, and the associated interval # is 10 seconds, the next occurrence should be 12:00:15. So we check # whether the current time (now) is 12:00:15 or later. now >= task.last_run + schedule.interval
So we can write the following code:
next_occurrence = something # ??? how do we define this ??? # We can express the current time as a Python datetime value, or we could # alternatively use the appropriate SQL function/name. now = Value(datetime.datetime.now()) # Or SQL('current_timestamp'), e.g. query = (Task
.select(Task, Schedule)
.join(Schedule)
.where(now >= next_occurrence))
For Postgresql we will multiple a static 1-second interval to calculate the offsets dynamically:
second = SQL("INTERVAL '1 second'") next_occurrence = Task.last_run + (Schedule.interval * second)
For MySQL we can reference the schedule's interval directly:
from peewee import NodeList # Needed to construct sql entity. interval = NodeList((SQL('INTERVAL'), Schedule.interval, SQL('SECOND'))) next_occurrence = fn.date_add(Task.last_run, interval)
For SQLite, things are slightly tricky because SQLite does not have a dedicated datetime type. So for SQLite, we convert to a unix timestamp, add the schedule seconds, then convert back to a comparable datetime representation:
next_ts = fn.strftime('%s', Task.last_run) + Schedule.interval next_occurrence = fn.datetime(next_ts, 'unixepoch')
This document describes changes to be aware of when switching from 2.x to 3.x.
I tried to keep changes backwards-compatible as much as possible. In some places, APIs that have changed will trigger a DeprecationWarning.
JOIN_INNER, JOIN_LEFT_OUTER, etc are now JOIN.INNER, JOIN.LEFT_OUTER, etc.
The C extension that contained implementations of the query result wrappers has been removed.
Additionally, Select.aggregate_rows() has been removed. This helper was used to de-duplicate left-join queries to give the appearance of efficiency when iterating a model and its relations. In practice, the complexity of the code and its somewhat limited usefulness convinced me to scrap it. You can instead use prefetch() to achieve the same result.
The Case() helper has moved from the playhouse.shortcuts module into the main peewee module.
The cast() method is no longer a function, but instead is a method on all column-like objects.
The InsertQuery.return_id_list() method has been replaced by a more general pattern of using _WriteQuery.returning().
The InsertQuery.upsert() method has been replaced by the more general and flexible Insert.on_conflict() method.
When using prefetch(), the collected instances will be stored in the same attribute as the foreign-key's backref. Previously, you would access joined instances using (backref)_prefetch.
The SQL object, used to create a composable a SQL string, now expects the second parameter to be a list/tuple of parameters.
The following extensions are no longer included in the playhouse:
The SQLite extension module's VirtualModel class accepts slightly different Meta options:
So, when declaring a model for a virtual table, it will be constructed roughly like this:
CREATE VIRTUAL TABLE "table name" USING extension_module (
prefix arguments,
field definitions,
arguments,
options)
The PostgresqlExtDatabase no longer registers the hstore extension by default. To use the hstore extension in 3.0 and onwards, pass register_hstore=True when initializing the database object.
The post_init signal has been removed.
The query-builder has been rewritten from the ground-up to be more flexible and powerful. There is now a generic, lower-level API for constructing queries.
Many SQLite-specific features have been moved from the playhouse.sqlite_ext module into peewee, such as:
See the "Using SQLite" section and "SQLite extensions" documents for more details.
The virtual-table implementation from sqlite-vtfunc has been folded into the peewee codebase.
If you find any bugs, odd behavior, or have an idea for a new feature please don't hesitate to open an issue on GitHub or contact me.
charles leifer
charles leifer
March 1, 2021 | 3.14.1 |