slon(1) | Slony-I 2.2.11 Documentation | slon(1) |
slon - Slony-I daemon
slon
[option]... [clustername] [conninfo]
slon is the daemon application that ‘runs’ Slony-I replication. A slon instance must be run for each node in a Slony-I cluster.
The nine levels of logging are:
The first five non-debugging log levels (from Fatal to Info) are always displayed in the logs. In early versions of Slony-I, the ‘suggested’ log_level value was 2, which would list output at all levels down to debugging level 2. In Slony-I version 2, it is recommended to set log_level to 0; most of the consistently interesting log information is generated at levels higher than that.
Short sync check intervals keep the origin on a ‘short leash’, updating its subscribers more frequently. If you have replicated sequences that are frequently updated without there being tables that are affected, this keeps there from being times when only sequences are updated, and therefore no syncs take place
If the node is not an origin for any replication set, so no updates are coming in, it is somewhat wasteful for this value to be much less the sync_interval_timeout value.
If application activity ceases, whether because the application is shut down, or because human users have gone home and stopped introducing updates, the slon(1) will iterate away, waking up every sync_interval milliseconds, and, as no updates are being made, no SYNC events would be generated. Without this timeout parameter, no SYNC events would be generated, and it would appear that replication was falling behind.
The sync_interval_timeout value will lead to eventually generating a SYNC, even though there was no real replication work to be done. The lower that this parameter is set, the more frequently slon(1) will generate SYNC events when the application is not generating replicable activity; this will have two effects:
(Of course, since there is no application load on the database, and no data to replicate, this load will be very easy to handle.
(Of course, since there is no replicable activity going on, being ‘more up to date’ is something of a mirage.)
Default is 10000 ms and maximum is 120000 ms. By default, you can expect each node to ‘report in’ with a SYNC every 10 seconds.
Note that SYNC events are also generated on subscriber nodes. Since they are not actually generating any data to replicate to other nodes, these SYNC events are of not terribly much value.
The default of 20 is probably suitable for small systems that can devote only very limited bits of memory to slon. If you have plenty of memory, it would be reasonable to increase this, as it will increase the amount of work done in each transaction, and will allow a subscriber that is behind by a lot to catch up more quickly.
Slon processes usually stay pretty small; even with large value for this option, slon would be expected to only grow to a few MB in size.
The big advantage in increasing this parameter comes from cutting down on the number of transaction COMMITs; moving from 1 to 2 will provide considerable benefit, but the benefits will progressively fall off once the transactions being processed get to be reasonably large. There isn't likely to be a material difference in performance between 80 and 90; at that point, whether ‘bigger is better’ will depend on whether the bigger set of SYNCs makes the LOG cursor behave badly due to consuming more memory and requiring more time to sortt.
In Slony-I version 1.0, slon will always attempt to group SYNCs together to this maximum, which won't be ideal if replication has been somewhat destabilized by there being very large updates (e.g. - a single transaction that updates hundreds of thousands of rows) or by SYNCs being disrupted on an origin node with the result that there are a few SYNCs that are very large. You might run into the problem that grouping together some very large SYNCs knocks over a slon process. When it picks up again, it will try to process the same large grouped set of SYNCs, and run into the same problem over and over until an administrator interrupts this and changes the -g value to break this ‘deadlock.’
In Slony-I version 1.1 and later versions, the slon instead adaptively ‘ramps up’ from doing 1 SYNC at a time towards the maximum group size. As a result, if there are a couple of SYNCs that cause problems, the slon will (with any relevant watchdog assistance) always be able to get to the point where it processes the troublesome SYNCs one by one, hopefully making operator assistance unnecessary.
If replication is running behind, slon will gradually increase the numbers of SYNCs grouped together, targeting that (based on the time taken for the last group of SYNCs) they shouldn't take more than the specified desired_sync_time value.
The default value for desired_sync_time is 60000ms, equal to one minute.
That way, you can expect (or at least hope!) that you'll get a COMMIT roughly once per minute.
It isn't totally predictable, as it is entirely possible for someone to request a very large update, all as one transaction, that can ‘blow up’ the length of the resulting SYNC to be nearly arbitrarily long. In such a case, the heuristic will back off for the next group.
The overall effect is to improve Slony-I's ability to cope with variations in traffic. By starting with 1 SYNC, and gradually moving to more, even if there turn out to be variations large enough to cause PostgreSQL backends to crash, Slony-I will back off down to start with one sync at a time, if need be, so that if it is at all possible for replication to progress, it will.
Set this to zero to disable slon-initiated vacuuming. If you are using something like pg_autovacuum to initiate vacuums, you may not need for slon to initiate vacuums itself. If you are not, there are some tables Slony-I uses that collect a lot of dead tuples that should be vacuumed frequently, notably pg_listener.
In Slony-I version 1.1, this changes a little; the cleanup thread tracks, from iteration to iteration, the earliest transaction ID still active in the system. If this doesn't change, from one iteration to the next, then an old transaction is still active, and therefore a VACUUM will do no good. The cleanup thread instead merely does an ANALYZE on these tables to update the statistics in pg_statistics.
This may make it easier to construct scripts to monitor multiple slon processes running on a single host.
This configuration is discussed further in Slon Run-time Configuration [“Run-time Configuration” [not available as a man page]]. If there are to be a complex set of configuration parameters, or if there are parameters you do not wish to be visible in the process environment variables (such as passwords), it may be convenient to draw many or all parameters from a configuration file. You might either put common parameters for all slon processes in a commonly-used configuration file, allowing the command line to specify little other than the connection info. Alternatively, you might create a configuration file for each node.
See more details on “slon_conf_command_on_log_archive” [not available as a man page].
This allows you to have a slon stop replicating after a certain point.
There is a concommittant downside to this lag; events that require all nodes to synchronize, as typically happens with SLONIK FAILOVER(7) and SLONIK MOVE SET(7), will have to wait for this lagging node.
That might not be ideal behaviour at failover time, or at the time when you want to run SLONIK EXECUTE SCRIPT(7).
slon returns 0 to the shell if it finished normally. It returns via exit(-1) (which will likely provide a return value of either 127 or 255, depending on your system) if it encounters any fatal error.
4 November 2022 | Application |