GREPMAIL(1p) | User Contributed Perl Documentation | GREPMAIL(1p) |
grepmail - search mailboxes for mail matching a regular expression
grepmail [--help|--version] [-abBDFhHilLmrRuvVw] [-C <cache-file>] [-j <status>] [-s <sizespec>] [-d <date-specification>] [-X <signature-pattern>] [-Y <header-pattern>] [[-e] <pattern>|-E <expr>|-f <pattern-file>] <files...>
By default grepmail looks in both header and body for the specified pattern.
When redirected to a file, the result is another mailbox, which can, in turn, be handled by standard User Agents, such as elm, or even used as input for another instance of grepmail.
At least one of -E, -e, -d, -s, or -u must be specified. The pattern is optional if -d, -s, and/or -u is used. The -e flag is optional if there is no file whose name is the pattern. The -E option can be used to specify complex search expressions involving logical operators. (See below.)
If a mailbox can not be found, grepmail first searches the directory specified by the MAILDIR environment variable (if one is defined), then searches the $HOME/mail, $HOME/Mail, and $HOME/Mailbox directories.
Many of the options and arguments are analogous to those of grep.
Note that complex pattern features such as "(?>...)" require that you use a version of perl which supports them. You can use the pattern "()" to indicate that you do not want to match anything. This is useful if you want to initialize the cache without printing any output.
If no mailbox is specified, takes input from stdin, which can be compressed or not. grepmail's behavior is undefined when ASCII and binary data is piped together as input.
Simple date expressions will first be parsed by Date::Parse. If this fails, grepmail will attempt to parse the date with Date::Manip, if the module is installed on the system. Use an empty pattern (i.e. -d "") to find emails without a "Date: ..." line in the header.
Date specifications without times are interpreted as having a time of midnight of that day (which is the morning), except for "after" and "since" specifications, which are interpreted as midnight of the following day. For example, "between today and tomorrow" is the same as simply "today", and returns emails whose date has the current day. ("now" is interpreted as "today".) The date specification "after July 5th" will return emails whose date is midnight July 6th or later.
For example, the expression
$email_header =~ /^From: .*\@coppit.org/ && $email =~ /grepmail/i
will find all emails which originate from coppit.org (you must escape the "@" sign with a backslash), and which contain the keyword "grepmail" anywhere in the message, in any capitalization.
-E is incompatible with -b, -h, and -e. -i, -M, -S, and -Y have not yet been implemented.
NOTE: The syntax of search expressions may change in the future. In particular, support for size, date, and other constraints may be added. The syntax may also be simplified in order to make expression formation easier to use (and perhaps at the expense of reduced functionality).
Size constraints must be of the form of:
- 12345: match size of exactly 12345
- <12345, <=12345, >12345, >=12345: match size less than,
less than or equal,
greater than, or greater than or equal to 12345
- 10000-12345: match size between 10000 and 12345 inclusive
If you are familiar with Perl regular expressions, this flag simply puts a "\b" before and after the search pattern.
In the style of procmail, special strings in the pattern will be expanded as follows:
^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope|Apparently(-Resent)?)-To):
which should match all headers with destination addresses.
If the regular expression contains "^FROM_DAEMON:" it will be substituted by
(^(Mailing-List:|Precedence:.*(junk|bulk|list)|To: Multiple recipients of |(((Resent-)?(From|Sender)|X-Envelope-From):|>?From )([^>]*[^(.%@a-z0-9])?(Post(ma?(st(e?r)?|n)|office)|(send)?Mail(er)?|daemon|m(mdf|ajordomo)|n?uucp|LIST(SERV|proc)|NETSERV|o(wner|ps)|r(e(quest|sponse)|oot)|b(ounce|bs\.smtp)|echo|mirror|s(erv(ices?|er)|mtp(error)?|ystem)|A(dmin(istrator)?|MMGR|utoanswer))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?
which should catch mails coming from most daemons.
If the regular expression contains "^FROM_MAILER:" it will be substituted by
(^(((Resent-)?(From|Sender)|X-Envelope-From):|>?From)([^>]*[^(.%@a-z0-9])?(Post(ma(st(er)?|n)|office)|(send)?Mail(er)?|daemon|mmdf|n?uucp|ops|r(esponse|oot)|(bbs\.)?smtp(error)?|s(erv(ices?|er)|ystem)|A(dmin(istrator)?|MMGR))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t][^<)]*(\(.*\).*)?)?$([^>]|$))
(a stripped down version of "^FROM_DAEMON:"), which should catch mails coming from most mailer-daemons.
So, to search for all emails to or from "Andy":
grepmail -Y '(^TO:|^From:)' Andy mailbox
Count the number of emails. ("." matches every email.)
grepmail -r . sent-mail
Get all email between 2000 and 3000 bytes about books
grepmail books -s 2000-3000 sent-mail
Get all email that you mailed yesterday
grepmail -d yesterday sent-mail
Get all email that you mailed before the first thursday in June 1998 that pertains to research (requires Date::Manip):
grepmail research -d "before 1st thursday in June 1998" sent-mail
Get all email that you mailed before the first of June 1998 that pertains to research:
grepmail research -d "before 6/1/98" sent-mail
Get all email you received since 8/20/98 that wasn't about research or your job, ignoring case:
grepmail -iv "(research|job)" -d "since 8/20/98" saved-mail
Get all email about mime but not about Netscape. Constrain the search to match the body, since most headers contain the text "mime":
grepmail -b mime saved-mail | grepmail Netscape -v
Print a list of all mailboxes containing a message from Rodney. Constrain the search to the headers, since quoted emails may match the pattern:
grepmail -hl "^From.*Rodney" saved-mail*
Find all emails with the text "Pilot" in both the header and the body:
grepmail -hb "Pilot" saved-mail*
Print a count of the number of messages about grepmail in all saved-mail mailboxes:
grepmail -br grepmail saved-mail*
Remove any duplicates from a mailbox:
grepmail -u saved-mail
Convert a Gnus mailbox to mbox format:
grepmail . gnus-mailbox-dir/* > mbox
Search for all emails to or from an address (taking into account wrapped headers and different header names):
grepmail -Y '(^TO:|^From:)' my@email.address saved-mail
Find all emails from postmasters:
grepmail -Y '^FROM_MAILER:' . saved-mail
grepmail will not create temporary files while decompressing compressed archives. The last version to do this was 3.5. While the new design uses more memory, the code is much simpler, and there is less chance that email can be read by malicious third parties. Memory usage is determined by the size of the largest email message in the mailbox.
The MAILDIR environment variable can be used to specify the default mail directory. This directory will be searched if the specified mailbox can not be found directly.
The HOME environment variable is also used to find mailboxes if they can not be found directly. It is also used to store grepmail state information such as its cache file.
The reason for this problem is that Date::Manip, as of version 5.42, forces default values for parsed dates and times. This means that grepmail has a hard time determining whether the user supplied certain time/date fields. (e.g. Did Date::Manip provide a default time of 0:00, or did the user specify it?) grepmail tries to work around this problem, but the workaround is inherently incomplete in some rare cases.
This code is distributed under the GNU General Public License (GPL) Version 2. See the file LICENSE in the distribution for details.
David Coppit <david@coppit.org>
elm(1), mail(1), grep(1), perl(1), printmail(1), Mail::Internet(3), procmailrc(5). Crocker, D. H., Standard for the Format of Arpa Internet Text Messages, RFC 822.
2022-08-01 | perl v5.34.0 |