PGCOPYDB FOLLOW(1) | pgcopydb | PGCOPYDB FOLLOW(1) |
pgcopydb follow - pgcopydb follow
The command pgcopydb follow replays the database changes registered at the source database with the logical decoding pluing wal2json into the target database.
IMPORTANT:
When using Logical Decoding with pgcopydb or another tool, consider making sure you're familiar with the Logical Replication Restrictions that apply. In particular:
When using DDL for partition scheme maintenance, such as when using the pg_partman extension, then consider creating a week or a month of partitions in advance, so that creating new partitions does not happen during the migration window.
When using pgcopydb clone --follow (starting with pgcopydb version 0.9) then the sequence data is synced at the end of the operation, after the cutover point implemented via the pgcopydb stream sentinel set endpos.
Updating the sequences manually is also possible by running the command pgcopydb copy sequences.
See the Postgres documentation page for Logical Replication Restrictions to read the exhaustive list of restrictions.
pgcopydb follow: Replay changes from the source database to the target database usage: pgcopydb follow --source ... --target ...
--source Postgres URI to the source database
--target Postgres URI to the target database
--dir Work directory to use
--filters <filename> Use the filters defined in <filename>
--restart Allow restarting when temp files exist already
--resume Allow resuming operations after a failure
--not-consistent Allow taking a new snapshot on the source database
--snapshot Use snapshot obtained with pg_export_snapshot
--slot-name Use this Postgres replication slot name
--create-slot Create the replication slot
--origin Use this Postgres replication origin node name
--endpos Stop replaying changes when reaching this LSN
This command runs two concurrent subprocesses.
The logical decoding plugin wal2json must be available on the source database system.
Each time a JSON file is closed, an auxilliary process is started to transform the JSON file into a matching SQL file. This processing is done in the background, and the main receiver process only waits for the transformation process to be finished when there is a new JSON file to transform.
In other words, only one such transform process can be started in the background, and the process is blocking when a second one could get started.
The design model here is based on the assumption that receiving the next set of JSON messages that fills-up a whole JSON file is going to take more time than transforming the JSON file into an SQL file. When that assumption proves wrong, consider opening an issue on the github project for pgcopydb.
The Postgres API for Replication Progress Tracking is used in that process so that we can skip already applied transactions at restart or resume.
It is possible to start the pgcopydb follow command and then later, while it's still running, set the LSN for the end position with the same effect as using the command line option --endpos, or switch from prefetch mode only to prefetch and catchup mode. For that, see the commands pgcopydb stream sentinel set endpos, pgcopydb stream sentinel set apply, and pgcopydb stream sentinel set prefetch.
Note that in many case the --endpos LSN position is not known at the start of this command. Also before entering the prefetch and apply mode it is important to make sure that the initial base copy is finished.
Finally, it is also possible to setup the streaming replication options before using the pgcopydb follow command: see the pgcopydb stream setup and pgcopydb stream cleanup commands.
Postgres Logical Decoding works with replaying changes using SQL statements, and for that exposes the concept of Replica Identity as described in the documentation for the ALTER TABLE ... REPLICA IDENTITY command.
To quote Postgres docs:
To support Change Data Capture with Postgres Logical Decoding for tables that do not have a Primary Key, then it is necessary to use the ALTER TABLE ... REPLICA IDENTITY command for those tables.
In practice the two following options are to be considered:
When using pgcopydb clone --follow a logical replication slot is created on the source database before the initial COPY, using the same Postgres snapshot. This ensure data consistency.
Within the pgcopydb clone --follow approach, it is only possible to start applying the changes from the source database after the initial COPY has finished on the target database.
Also, from the Postgres documentation we read that Postgres replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys.
Accumulating WAL segments on the primary during the whole duration of the initial COPY involves capacity hazards, which translate into potential File System is Full errors on the WAL disk of the source database. It is crucial to avoid such a situation.
This is why pgcopydb implements CDC pre-fetching. In parallel to the initial COPY the command pgcopydb clone --follow pre-fetches the changes in local JSON and SQL files. Those files are placed in the XDG_DATA_HOME location, which could be a mount point for an infinite Blob Storage area.
The pgcopydb follow command is a convenience command that's available as a logical decoding client for the wal2json plugin, and it shares the same implementation as the pgcopydb clone --follow command. As a result, the pre-fetching strategy is also relevant to the pgcopydb follow command.
To track progress and allow resuming of operations, pgcopydb uses a sentinel table on the source database. The sentinel table consists of a single row with the following fields:
$ pgcopydb stream sentinel get startpos 1/8D173AF8 endpos 0/0 apply disabled write_lsn 0/0 flush_lsn 0/0 replay_lsn 0/0
Note that you can use the command pgcopydb stream sentinel get --json to fetch a JSON formatted output, such as the following:
{
"startpos": "1/8D173AF8",
"endpos": "1/8D173AF8",
"apply": false,
"write_lsn": "0/0",
"flush_lsn": "0/0",
"replay_lsn": "0/0" }
The first three fields (startpos, endpos, apply) are specific to pgcopydb, then the following three fields (write_lsn, flush_lsn, replay_lsn) follow the Postgres replication protocol as visible in the docs for the pg_stat_replication function.
The startpos field is the current LSN on the source database at the time when the Change Data Capture is setup in pgcopydb, such as when using the pgcopydb stream setup command.
Note that both the pgcopydb follow and the pgcopydb clone --follow command implement the setup parts if the pgcopydb stream setup has not been used already.
The endpos field is last LSN position from the source database that pgcopydb replays. The command pgcopydb follow (or pgcopydb clone --follow) stops when reaching beyond this LSN position.
The endpos can be set at the start of the process, which is useful for unit testing, or while the command is running, which is useful in production to define a cutover point.
To define the endpos while the command is running, use pgcopydb stream sentinel set endpos.
The apply field is a boolean (enabled/disabled) that control the catchup process. The pgcopydb catchup process replays the changes only when the apply boolean is set to true.
The pgcopydb clone --follow command automatically enables the apply field of the sentinel table as soon as the initial COPY is done.
To manually control the apply field, use the pgcopydb stream sentinel set apply command.
The Postgres documentation for pg_stat_replication.write_lsn is: Last write-ahead log location written to disk by this standby server.
In the pgcopydb case, the sentinel field write_lsn is the position that has been written to disk (as JSON) by the streaming process.
The Postgres documentation for pg_stat_replication.flush_lsn is: Last write-ahead log location flushed to disk by this standby server
In the pgcopydb case, the sentinel field flush_lsn is the position that has been written and then fsync'ed to disk (as JSON) by the streaming process.
The Postgres documentation for pg_stat_replication.replay_lsn is: Last write-ahead log location replayed into the database on this standby server
In the pgcopydb case, the sentinel field replay_lsn is the position that has been applied to the target database, as kept track from the WAL.json and then the WAL.sql files, and using the Postgres API for Replication Progress Tracking.
The replay_lsn is also shared by the pgcopydb streaming process that uses the Postgres logical replication protocol, so the pg_stat_replication entry associated with the replication slot used by pgcopydb can be used to monitor replication lag.
As the pgcopydb streaming processes maintain the sentinel table on the source database, it is also possible to use it to keep track of the logical replication progress.
The following options are available to pgcopydb follow:
In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.
When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.
Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.
Finally, using --resume requires the use of --not-consistent.
Per the Postgres documentation about pg_export_snapshot:
Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.
Must be using the wal2json output plugin, available with format-version 2.
The --endpos option is not aware of transaction boundaries and may truncate output partway through a transaction. Any partially output transaction will not be consumed and will be replayed again when the slot is next read from. Individual messages are never truncated.
See also documentation for pg_recvlogical.
Postgres uses a notion of an origin node name as documented in Replication Progress Tracking. This option allows to pick your own node name and defaults to "pgcopydb". Picking a different name is useful in some advanced scenarios like migrating several sources in the same target, where each source should have their own unique origin node name.
PGCOPYDB_SOURCE_PGURI
PGCOPYDB_TARGET_PGURI
PGCOPYDB_SNAPSHOT
TMPDIR
XDG_DATA_HOME
When using Change Data Capture (through --follow option and Postgres logical decoding with wal2json) then pgcopydb pre-fetches changes in JSON files and transform them into SQL files to apply to the target database.
These files are stored at the following location, tried in this order:
Dimitri Fontaine
2022, Dimitri Fontaine
November 3, 2022 | 0.10 |