squatter - Cyrus IMAP documentation
Create SQUAT and Xapian indexes for mailboxes
general:
squatter [ -C config-file ] [mode] [options] [source]
i.e.:
squatter [ -C config-file ] [ -v ] [ -a ] [ -S seconds ] [ -Z ]
squatter [ -C config-file ] [ -v ] [ -a ] [ -i ] [ -N name ] [ -S seconds ] [ -r ] [ -Z ] mailbox...
squatter [ -C config-file ] [ -v ] [ -a ] [ -i ] [ -N name ] [ -S seconds ] [ -r ] [ -Z ] -u user...
squatter [ -C config-file ] [ -v ] [ -a ] -R [ -n channel ] [ -d ] [ -S seconds ] [ -Z ]
squatter [ -C config-file ] [ -v ] [ -a ] -f synclogfile [ -S seconds ] [ -Z ]
squatter [ -C config-file ] [ -v ] -t srctier(s)... -z desttier [ -B ] [ -F ] [ -U ] [ -T reindextiers ] [ -X ] [ -o ] [ -S seconds ] [ -u user... ]
NOTE:
The name “squatter” once referred
both to the SQUAT indexing engine and to the command used to create indexes.
Now that Cyrus supports more than one index type – SQUAT and Xapian, as
of this writing – the name “squatter” refers to
the command used to control index creation. The terms “SQUAT” or
“SQUAT index(es)” refers to the indexes used by the older SQUAT
indexing engine. Post v3 the search_engine setting in imapd.conf
determines which search engine is used.
squatter creates a new text index for one or more IMAP
mailboxes. The index is a unified index of all of the header and body text
of each message in a given mailbox. This index is used to significantly
reduce IMAP SEARCH times on a mailbox.
mode is one of indexer, search, rolling, synclog, compact
or audit.
By default, squatter creates an index of ALL messages in
the mailbox, not just those since the last time that it was run. The
-i option is used to select incremental updates. Any messages
appended to the mailbox after squatter is run, will NOT be included
in the index. To include new messages in the index, squatter must be
run again, or on a regular basis via crontab, an entry in the EVENTS section
of cyrus.conf(5) or use rolling mode (-R).
In the first synopsis, squatter indexes all mailboxes.
In the second synopsis, squatter indexes the specified
mailbox(es). The mailboxes are space-separated.
In the third synopsis, squatter indexes the specified
user(s) mailbox(es).
For the latter two index modes (mailbox, user) one may optionally
specify -r to recurse from the specified start, or -a to limit
action only to mailboxes which have the shared
/vendor/cmu/cyrus-imapd/squat annotation set to
“true”.
In the fourth synopsis, squatter runs in rolling mode. In
this mode squatter backgrounds itself and runs as a daemon (unless
-d is set), listening to a sync log channel chosen using the
-n option, and set up using the sync_log_channels setting in
imapd.conf(5). Very soon after messages are delivered or uploaded to
mailboxes squatter will incrementally index the affected mailbox (see
notes, below).
In the fifth synopsis, squatter reads a single sync log
file and performs incremental indexing on the mailbox(es) listed therein.
This is sometimes useful for cleaning up after problems with rolling
mode.
In the sixth synopsis, squatter will compact indices from
srctier(s) to desttier, optionally reindexing (-X) or
filtering expunged records (-F) in the process. The optional
-T flag may be used to specify members of srctiers which must be
reindexed. These files are eventually copied with rsync -a and then
removed by rm. rsync can increase the load average of the
system, especially when the temporary directory is on tmpfs. To
throttle rsync it is possible to modify the call in
imap/search_xapian.c and pass -\-bwlimit=<number> as
further parameter. The -o flag may be used to direct that a single
index be copied, rather than compacted, from srctier to
desttier. The -u flag may be used to restrict operation to the
specified user(s).
For all modes, the -S option may be specified, causing
squatter to pause seconds seconds after each mailbox, to
smooth loads.
When using the Xapian engine the -Z option may be
specified, for the indexing modes. This tells squatter to consult the
Xapian internally indexed GUIDs, rather than relying on what’s stored
in cyrus.indexed.db, allowing for recovery from broken
cyrus.indexed.db at the sacrifice of efficiency.
NOTE:
Incremental updates are very inefficient with the SQUAT
search engine. If using SQUAT for large and active mailboxes, you should run
squatter periodically as an EVENT in
cyrus.conf(5).
NOTE:
Messages and mailboxes that have not been indexed CAN
still be SEARCHed, just not as quickly as those with an index.
squatter reads its configuration options out of the
imapd.conf(5) file unless specified otherwise by -C.
- -a, --squat-annot
- Only create indexes for mailboxes which have the shared
/vendor/cmu/cyrus-imapd/squat annotation set to
“true”.
The value of the /vendor/cmu/cyrus-imapd/squat
annotation is inherited by all children of the given mailbox, so an
entire mailbox tree can be indexed (or not indexed) by setting a single
annotation on the root of that tree with a value of “true”
(or “false”). If a mailbox does not have a
/vendor/cmu/cyrus-imapd/squat annotation set on it (or does not
inherit one), then the mailbox is not indexed. In other words, the
implicit value of /vendor/cmu/cyrus-imapd/squat is
“false”.
- -A, --audit
- Audits the specified mailboxes (or all), reports any unindexed messages.
This feature is only available on the master branch.
- -d, --nodaemon
- In rolling mode, don’t background and do emit log messages on
standard error. Useful for debugging. This feature was introduced in
version 3.0.
- -B, --skip-locked
- In compact mode, use non-blocking lock to start and skip any users who
have their xapianactive file locked at the time (i.e another reindex task)
This feature is only available on the master branch.
- -F, --filter
- In compact mode, filter the resulting database to only include messages
which are not expunged in mailboxes with existing name/uidvalidity. This
feature was introduced in version 3.0.
- -N name, --name=name
- Only index mailboxes beginning with name while iterating through
the mailbox list derived from other options.
- -n channel,
--channel=channel
- In rolling mode, specify the name of the sync log channel that
squatter will listen to. The default is “squatter”.
This channel must be defined in imapd.conf(5) before being used.
This feature was introduced in version 3.0.
- -o, --copydb
- In compact mode, if only one source database is selected, just copy it to
the destination rather than compacting. This feature was introduced in
version 3.0.
- -p,
--allow-partials
When indexing, allow messages to be partially indexed.
This may occur if attachment indexing is enabled but indexing failed for one
or more attachment body parts. If this flag is set, the message is partially
indexed and squatter continues. Otherwise squatter aborts with an error. Also
see -P. Xapian only. This feature is only available on the master
branch.
- -P,
--reindex-partials
- When reindexing, then attempt to reindex any partially indexed messages
(see -p). Setting this flag implies -Z. Xapian only. This
feature is only available on the master branch.
- -L,
--reindex-minlevel=level
- When reindexing, index all messages that have an index level less than
level. Currently, Cyrus only supports two index levels: A message for
which attachment indexing was never attempted has index level 1. A message
that has indexed attachments, or does not contain attachments, has index
level 3. Consequently, running squatter with minlevel set to 3 will cause
it to attempt reindexing all messages, for which attachment indexing never
was attempted. Future Cyrus versions may introduce additional levels.
Setting this flag implies -Z. Xapian only. This feature is only
available on the master branch.
- -R, --rolling
- Run in rolling mode; squatter runs as a daemon listening to a sync
log channel and continuously incrementally indexing mailboxes. See also
-d and -n. This feature was introduced in version 3.0.
- -r, --recursive
- Recursively create indexes for all sub-mailboxes of the user, mailboxes or
mailbox prefixes given as arguments.
- -s delta,
--squat-skip=delta
- Skip mailboxes that have not been modified since last index. This is
achieved by comparing the last modification time of a mailbox to the last
time the squat index of this mailbox got updated. If the mailbox
modification time plus delta is less than the squat index modification
time, then the mailbox is skipped. The argument value delta is defined in
seconds and must be greater than or equal to zero. The historical default
delta was 60, and this remains a good general choice, but for technical
reasons it must now be specified explicitly. Squat only.
- -S seconds,
--sleep=seconds
- After processing each mailbox, sleep for “seconds” before
continuing. Can be used to provide some load balancing. Accepts fractional
amounts. This feature was introduced in version 3.0.
- -T reindextiers,
--reindex-tier=reindextiers
- In compact mode, a comma-separated subset of the source tiers (see
-t) to be reindexed. Similar to -X but allows limiting the
tiers that will be reindexed. This feature was introduced in version
3.0.
- -t srctiers,
--srctier=srctiers
- In compact mode, the comma-separated source tier(s) for the compacted
indices. At least one source tier must be specified in compact mode.
Xapian only. This feature was introduced in version 3.0.
- -u name, --user=name
- Extra options refer to usernames (e.g. foo@bar.com) rather than
mailbox names. Usernames are space-separated. This feature was introduced
in version 3.0.
- -U, --only-upgrade
- In compact mode, only compact if re-indexing. Xapian only. This feature is
only available on the master branch.
- -v, --verbose
- Increase the verbosity of progress/status messages. Sometimes additional
messages are emitted on the terminal with this option and the messages are
unconditionally sent to syslog. Sometimes messages are sent to syslog,
only if -v is provided. In rolling and synclog modes, -vv sends even more
messages to syslog.
- -X, --reindex
- Reindex all the messages before compacting. This mode reads all the lists
of messages indexed by the listed tiers, and re-indexes them into a
temporary database before compacting that into place. Xapian only. This
feature was introduced in version 3.0.
- -z desttier,
--compact=desttier
- In compact mode, the destination tier for the compacted indices. This must
be specified in compact mode. Xapian only. This feature was introduced in
version 3.0.
- -Z,
--internalindex
- When indexing messages, use the Xapian internal cyrusid rather than
referencing the ranges of already indexed messages to know if a particular
message is indexed. Useful if the ranges get out of sync with the actual
messages (e.g. if files on a tier are lost) Xapian only. This feature is
only available on the master branch.
squatter is typically deployed via entries in
cyrus.conf(5), in either the DAEMON or EVENTS sections.
For the older SQUAT search engine, which offers poor performance
in rolling mode (-R) we recommend triggering periodic runs via entries in
the EVENTS section, as follows:
Sample entries from the EVENTS section of cyrus.conf(5) for
periodic squatter runs:
EVENTS {
# reindex changed mailboxes (fulltext) approximately every three hours
squatter1 cmd="/usr/bin/ionice -c idle /usr/lib/cyrus/bin/squatter -i" period=180
# reindex all mailboxes (fulltext) daily
squattera cmd="/usr/lib/cyrus/bin/squatter" at=0117
}
For the newer Xapian search engine, and with sufficiently fast
storage, the rolling mode (-R) offers advantages. Use of rolling mode
requires that squatter be invoked in the DAEMON section.
Sample entries for the DAEMON section of cyrus.conf(5) for rolling
squatter operation:
DAEMON {
# run a rolling squatter using the default sync_log channel "squatter"
squatter cmd="squatter -R"
# run a rolling squatter using a specific sync_log channel
squatter cmd="squatter -R -n indexer"
}
NOTE:
When using the
-R rolling mode, you MUST enable
sync_log operation in
imapd.conf(5) via the
sync_log: on setting, and
MUST define a sync_log channel via the
sync_log_channels: setting. If
also using replication, you must either explicitly specify your replication
sync_log channel via the
sync_log_channels directive with a name, or
specify the default empty name with “” (the two-character string
U+22 U+22). [Please see
imapd.conf(5) for details].
NOTE:
When configuring rolling search indexing on a
replica, one must consider whether sync_logs will be written at all. In
this case, please consider the setting
sync_log_unsuppressable_channels
to ensure that the sync_log channel upon which one’s squatter instance
depends will continue to be written. See
imapd.conf(5) for details.
NOTE:
When using the Xapian search engine, you must define
various settings in
imapd.conf(5). Please read all relevant Xapian
documentation in this release before using Xapian.
[NB: More examples needed]
Support for additional search engines was added in version
3.0.
The following command-line switches were added in version 3.0:
The following command-line settings were added in version 3.0:
-S <seconds>, -T <directory>, -f <synclogfile>, -n <channel>, -t srctier..., -z desttier
/etc/imapd.conf, /etc/cyrus.conf
The Cyrus Team, Nic Bernstein (Onlight)
1993–2023, The Cyrus Team