MU-INDEX(1) | General Commands Manual | MU-INDEX(1) |
mu_index - index e-mail messages stored in Maildirs
mu index [options]
mu index is the mu command for scanning the contents of Maildir directories and storing the results in a Xapian database. The data can then be queried using mu-find(1).
Note that before the first time you run mu index, you must run mu init to initialize the database.
index understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition, it understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It can also deal with VFAT-based Maildirs which use '!' as the separators instead of ':'.
E-mail messages which are not stored in something resembling a maildir leaf-directory (cur and new) are ignored, as are the cache directories for notmuch and gnus, and any dot-directory.
The maildir must be on a single file-system; symlinks are not followed.
If there is a file called .noindex in a directory, the contents of that directory and all of its subdirectories will be ignored. This can be useful to exclude certain directories from the indexing process, for example directories with spam-messages.
If there is a file called .noupdate in a directory, the contents of that directory and all of its subdirectories will be ignored, unless we do a full rebuild (with mu init). This can be useful to speed up things you have some maildirs that never change. Note that you can still search for these messages, this only affects updating the database.
There also the --lazy-check which can greatly speed up indexing; see below for details.
The first run of mu index may take a few minutes if you have a lot of mail (tens of thousands of messages). Fortunately, such a full scan needs to be done only once; after that it suffices to index the changes, which goes much faster. See the 'Note on performance (i,ii,iii)' below for more information.
The optional 'phase two' of the indexing-process is the removal of messages from the database for which there is no longer a corresponding file in the Maildir. If you do not want this, you can use -n, --nocleanup.
When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when you press Ctrl-C during the indexing process), it tries to shutdown gracefully; it tries to save and commit data, and close the database etc. If it receives another signal (e.g., when pressing Ctrl-C once more), mu index will terminate immediately.
Note, some of the general options are described in the mu(1) man-page and not here, as they apply to multiple mu commands.
As a non-scientific benchmark, a simple test on the author's machine (a Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no existing database, and a maildir with 27273 messages:
(about 103 messages per second)
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
A second run, which is the more typical use case when there is a database already, goes much faster:
(more than 56818 messages per second)
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
Note that each test flushes the caches first; a more common use case might be to run mu index when new mail has arrived; the cache may stay quite 'warm' in that case:
which is more than 30000 messages per second.
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
As per June 2012, we did the same non-scientific benchmark, this time with an Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589 messages. We start without an existing database.
(about 813 messages per second)
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
A second run, which is the more typical use case when there is a database already, goes much faster:
(more than 173000 messages per second)
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
As per July 2016, we did the same non-scientific benchmark, again with the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains 72525 messages.
(about 1099 messages per second).
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total
As shown, mu has been getting faster with each release, even with relatively expensive new features such as text-normalization (for case-insensitve/accent-insensitive matching). The profiles are dominated by operations in the Xapian database now.
mu stores logs of its operations and queries in <muhome>/mu.log (by default, this is ~/.cache/mu/mu.log). Upon startup, mu checks the size of this log file. If it exceeds 1 MB, it will be moved to ~/.cache/mu/mu.log.old, overwriting any existing file of that name, and start with an empty log file. This scheme allows for continued use of mu without the need for any manual maintenance of log files.
mu index uses MAILDIR to find the user's Maildir if it has not been specified explicitly with --maildir=<maildir>. If MAILDIR is not set, mu index will try ~/Maildir.
mu index return 0 upon successful completion, and any other number greater than 0 signals an error.
Please report bugs if you find them: https://github.com/djcb/mu/issues
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
February 2020 | User Manuals |