MAIRIX(1) | General Commands Manual | MAIRIX(1) |
mairix - index and search mail folders
mairix [ -v|--verbose ] [ -p|--purge ] [ -f|--rcfile mairixrc ] [ -F|--fast-index ] [ --force-hash-key-new-database hash ]
mairix [ -v|--verbose ] [ -f|--rcfile mairixrc ] [ -r|--raw-output ] [ -x|--excerpt-output ] [ -H|--force-hardlinks ] [ -o|--mfolder mfolder ] [ -a|--augment ] [ -t|--threads ] search-patterns
mairix [ -h|--help ]
mairix [ -V|--version ]
mairix [ -d|--dump ]
mairix indexes and searches a collection of email messages. The folders containing the messages for indexing are defined in the configuration file. The indexing stage produces a database file. The database file provides rapid access to details of the indexed messages during searching operations. A search normally produces a folder (so-called mfolder) containing the matched messages. However, a raw mode (-r) exists which just lists the matched messages instead.
It can operate with the following folder types
If maildir or MH source folders are used, and a search outputs its matches to an mfolder in maildir or MH format, symbolic links are used to reference the original messages inside the mfolder. However, if mbox folders are involved, copies of messages are made instead. If IMAP folders are used for both source results, IMAP server-side copies of messages are made. With IMAP source folders and any other type of results folder, messages are downloaded from the IMAP server to be written to the results folder. With an IMAP results folder and any other type of source folders, messages are uploaded to the IMAP server to be appended to the results folder.
mairix decides whether indexing or searching is required by looking for the presence of any search-patterns on the command line.
The nochecks directive in the rc file has the same effect.
This option tells mairix to assume that when a message currently on-disc has a name matching one already in the database, it should assume the message is unchanged.
A later indexing run without using this option will fix up any rescans that were missed due to its use.
mairix will refuse to output search results into any folder that appears to be amongst those that are indexed. This is to prevent accidental deletion of emails.
Message body is taken to mean any body part of type text/plain or text/html. For text/html, text within meta tags is ignored. In particular, the URLs inside <A HREF="..."> tags are not currently indexed. Non-text attachments are ignored. If there's an attachment of type message/rfc822, this is parsed and the match is performed on this sub-message too. If a hit occurs, the enclosing message is treated as having a hit.
For example, to match messages between 10kilobytes and 20kilobytes in size, the following search term can be used:
mairix z:10k-20k
The suffix 'k' on a number means multiply by 1024, and the suffix 'M' on a number means multiply by 1024*1024.
mairix n:mairix=
would match all messages which have attachments whose names contain the substring mairix.
The attachment name is determined from the name=xxx or filename=xxx qualifiers on the Content-Type: and Content-Disposition: headers respectively.
mairix F:-s d:1w-
would match any unread message less than a week old, and
mairix F:f-r d:-1m
would match any flagged message older than a month which you haven't replied to yet.
Note that the flag characters and their meanings agree with those used as the suffix letters on message filenames in maildir folders.
Multiple body parts may be grouped together, if a match in any of them is sought. Common examples follow.
The a: search pattern is an abbreviation for tcf:; i.e. match the word in the To:, Cc: or From: headers. ("a" stands for "address" in this case.)
The word argument to the search strings can take various forms.
The binding order of the constructions is:
This section describes the syntax used for specifying dates when searching using the `d:' option.
Dates are specified as a range. The start and end of the range can both be specified. Alternatively, if the start is omitted, it is treated as being the beginning of time. If the end is omitted, it is treated as the current time.
There are 4 basic formats:
The start and end can be specified either absolute or relative. A relative endpoint is given as a number followed by a single letter defining the scaling:
letter | short for | example | meaning |
d | days | 3d | 3 days |
w | weeks | 2w | 2 weeks (14 days) |
m | months | 5m | 5 months (150 days) |
y | years | 4y | 4 years (4*365 days) |
Months are always treated as 30 days, and years as 365 days, for this purpose.
Absolute times can be specified in many forms. Some forms have different meanings when they define a start date from that when they define an end date. Where a single expression specifies both the start and end (i.e. where the argument to d: doesn't contain a `-'), it will usually have different interpretations in the two cases.
In the examples below, suppose the current date is Sunday May 18th, 2003 (when I started to write this material.)
Example | Start date | End date | Notes |
d:20030301-20030425 | March 1st, 2003 | 25th April, 2003 | |
d:030301-030425 | March 1st, 2003 | April 25th, 2003 | century assumed |
d:mar1-apr25 | March 1st, 2003 | April 25th, 2003 | |
d:Mar1-Apr25 | March 1st, 2003 | April 25th, 2003 | case insensitive |
d:MAR1-APR25 | March 1st, 2003 | April 25th, 2003 | case insensitive |
d:1mar-25apr | March 1st, 2003 | April 25th, 2003 | date and month in either order |
d:2002 | January 1st, 2002 | December 31st, 2002 | whole year |
d:mar | March 1st, 2003 | March 31st, 2003 | most recent March |
d:oct | October 1st, 2002 | October 31st, 2002 | most recent October |
d:21oct-mar | October 21st, 2002 | March 31st, 2003 | start before end |
d:21apr-mar | April 21st, 2002 | March 31st, 2003 | start before end |
d:21apr- | April 21st, 2003 | May 18th, 2003 | end omitted |
d:-21apr | January 1st, 1900 | April 21st, 2003 | start omitted |
d:6w-2w | April 6th, 2003 | May 4th, 2003 | both dates relative |
d:21apr-1w | April 21st, 2003 | May 11th, 2003 | one date relative |
d:21apr-2y | April 21st, 2001 | May 11th, 2001 | start before end |
d:99-11 | January 1st, 1999 | May 11th, 2003 | 2 digits are a day of the month if possible, otherwise a year |
d:99oct-1oct | October 1st, 1999 | October 1st, 2002 | end before now, single digit is a day of the month |
d:99oct-01oct | October 1st, 1999 | October 31st, 2001 | 2 digits starting with zero treated as a year |
d:oct99-oct1 | October 1st, 1999 | October 1st, 2002 | day and month in either order |
d:oct99-oct01 | October 1st, 1999 | October 31st, 2001 | year and month in either order |
The principles in the table work as follows.
If the match folder does not exist when running in search mode, it is automatically created. For 'mformat=maildir' (the default), this should be all you need to do. If you use 'mformat=mh', you may have to run some commands before your mailer will recognize the folder. e.g. for mutt, you could do
mkdir -p /home/richard/Mail/mfolder touch /home/richard/Mail/mfolder/.mh_sequences
which seems to work. Alternatively, within mutt, you could set MBOX_TYPE to 'mh' and save a message to '+mfolder' to have mutt set up the structure for you in advance.
If you use Sylpheed, the best way seems to be to create the new folder from within Sylpheed before letting mairix write into it.
Suppose my email address is <richard@doesnt.exist>.
Either of the following will match all messages newer than 3 months from me with the word 'chrony' in the subject line:
mairix d:3m- f:richard+doesnt+exist s:chrony mairix d:3m- f:richard@doesnt.exist s:chrony
Suppose I don't mind a few spurious matches on the address, I want a wider date range, and I suspect that some messages I replied to might have had the subject keyword spelt wrongly (let's allow up to 2 errors):
mairix d:6m- f:richard s:chrony=2
mairix works exclusively in terms of words. The index that's built in indexing mode contains a table of which words occur in which messages. Hence, the search capability is based on finding messages that contain particular words. mairix defines a word as any string of alphanumeric characters + underscore. Any whitespace, punctuation, hyphens etc are treated as word boundaries.
mairix has special handling for the To:, Cc: and From: headers. Besides the normal word scan, these headers are scanned a second time, where the characters '@', '-' and '.' are also treated as word characters. This allows most (if not all) email addresses to appear in the database as single words. So if you have a mail from wibble@foobar.zzz, it will match on both these searches
mairix f:foobar mairix f:wibble@foobar.zzz
It should be clear by now that the searching cannot be used to find messages matching general regular expressions. This has never been much of a limitation. Most searches are for particular keywords that were in the messages, or details of the recipients, or the approximate date.
It's also worth pointing out that there is no 'locality' information stored, so you can't search for messages that have one words 'close' to some other word. For every message and every word, there is a simple yes/no condition stored - whether the message contains the word in a particular header or in the body. So far this has proved to be adequate. mairix has a similar feel to using an Internet search engine.
~/.mairixrc
Copyright (C) 2002-2006 Richard P. Curnow <rc@rc0.org.uk>
We need a plugin scheme to allow more types of attachment to be scanned and indexed.
January 2006 |