Using COPY TO and COPY FROM#
Psycopg allows to operate with PostgreSQL COPY protocol. COPY
is
one of the most efficient ways to load data into the database (and to modify
it, with some SQL creativity).
Copy is supported using the Cursor.copy()
method, passing it a query of the
form COPY ... FROM STDIN
or COPY ... TO STDOUT
, and managing the
resulting Copy
object in a with
block:
with cursor.copy("COPY table_name (col1, col2) FROM STDIN") as copy:
# pass data to the 'copy' object using write()/write_row()
You can compose a COPY statement dynamically by using objects from the
psycopg.sql
module:
with cursor.copy(
sql.SQL("COPY {} TO STDOUT").format(sql.Identifier("table_name"))
) as copy:
# read data from the 'copy' object using read()/read_row()
The connection is subject to the usual transaction behaviour, so, unless the connection is in autocommit, at the end of the COPY operation you will still have to commit the pending changes and you can still roll them back. See Transactions management for details.
Writing data row-by-row#
Using a copy operation you can load data into the database from any Python
iterable (a list of tuples, or any iterable of sequences): the Python values
are adapted as they would be in normal querying. To perform such operation use
a COPY ... FROM STDIN
with Cursor.copy()
and use write_row()
on the resulting object in a with
block. On exiting the block the
operation will be concluded:
records = [(10, 20, "hello"), (40, None, "world")]
with cursor.copy("COPY sample (col1, col2, col3) FROM STDIN") as copy:
for record in records:
copy.write_row(record)
If an exception is raised inside the block, the operation is interrupted and the records inserted so far are discarded.
In order to read or write from Copy
row-by-row you must not specify
COPY
options such as FORMAT CSV
, DELIMITER
, NULL
:
please leave these details alone, thank you :)
Binary copy#
Binary copy is supported by specifying FORMAT BINARY
in the COPY
statement. In order to load binary data, all the types passed to the database
must have a binary dumper registered (see Binary parameters and results).
Note that PostgreSQL is particularly finicky when loading data in binary mode
and will apply no cast rules. This means that e.g. passing the value 100 to
an integer
column will fail because Psycopg will pass it as a smallint
value. You can work around the problem using the set_types()
method of
the Copy
object and specify carefully the types to dump.
Reading data row-by-row#
You can also do the opposite, reading rows out of a COPY ... TO STDOUT
operation, by iterating on rows()
. However this is not something you
may want to do normally: usually the normal query process will be easier to
use.
PostgreSQL, currently, doesn’t give complete type information on COPY
TO
, so the rows returned will have unparsed data, as strings or bytes,
according to the format.
with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy:
for row in copy.rows():
print(row) # return unparsed data: ('10', '2046-12-24')
You can improve the results by using set_types()
before reading, but
you have to specify them yourselves.
with cur.copy("COPY (VALUES (10::int, current_date)) TO STDOUT") as copy:
copy.set_types(["int4", "date"])
for row in copy.rows():
print(row) # (10, datetime.date(2046, 12, 24))
Copying block-by-block#
If data is already formatted in a way suitable for copy (for instance because
it is coming from a file resulting from a previous COPY TO
operation) it can
be loaded into the database using Copy.write()
instead.
with open("data", "r") as f:
with cursor.copy("COPY data FROM STDIN") as copy:
while data := f.read(BLOCK_SIZE):
copy.write(data)
In this case you can use any COPY
option and format, as long as the
input data is compatible. Data can be passed as str
, if the copy is in
FORMAT TEXT
, or as bytes
, which works with both FORMAT TEXT
and FORMAT BINARY
.
In order to produce data in COPY
format you can use a COPY ... TO
STDOUT
statement and iterate over the resulting Copy
object, which will
produce a stream of bytes
:
with open("data.out", "wb") as f:
with cursor.copy("COPY table_name TO STDOUT") as copy:
for data in copy:
f.write(data)
Asynchronous copy support#
Asynchronous operations are supported using the same patterns as above, using
the objects obtained by an AsyncConnection
. For instance, if f
is an
object supporting an asynchronous read()
method returning COPY
data,
a fully-async copy operation could be:
async with cursor.copy("COPY data FROM STDIN") as copy:
while data := await f.read():
await copy.write(data)
The AsyncCopy
object documentation describes the signature of the
asynchronous methods and the differences from its sync Copy
counterpart.