omindex - Index static website data via the filesystem
omindex [OPTIONS] --db DATABASE
[BASEDIR] DIRECTORY
omindex - Index static website data via the filesystem
DIRECTORY is the directory to start indexing from.
BASEDIR is the directory corresponding to URL (default:
DIRECTORY).
- -d,
--duplicates=ARG
- set duplicate handling: ARG can be 'ignore' or 'replace' (default:
replace)
- -p,
--no-delete
- skip the deletion of documents corresponding to deleted files
(--preserve-nonduplicates is a deprecated alias for
--no-delete)
- -e,
--empty-docs=ARG
- how to handle documents we extract no text from: ARG can be index, warn
(issue a diagnostic and index), or skip. (default: warn)
- -D,
--db=DATABASE
- path to database to use
- -U,
--url=URL
- base url BASEDIR corresponds to (default: /)
- -M,
--mime-type=EXT:TYPE
- assume any file with extension EXT has MIME Content-Type TYPE, instead of
using libmagic (empty TYPE removes any existing mapping for EXT; other
special TYPE values: 'ignore' and 'skip')
- -G,
--mime-type-match=GLOB:TYPE
- assume any file with leaf name matching shell wildcard pattern GLOB has
MIME Content-Type TYPE (special TYPE values: 'ignore' and 'skip')
- -F,
--filter=M[,[T][,C]]:CMD
- process files with MIME Content-Type M using command CMD, which produces
output (on stdout or in a temporary file) with format T (Content-Type or
file extension; currently txt (default), html or svg) in character
encoding C (default: UTF-8). E.g.
-Fapplication/octet-stream:'strings -n8' or
-Ftext/x-foo,,utf-16:'foo2utf16 %f %t'
- --read-filters=FILE
- bulk-load --filter arguments from FILE, which should contain one
such argument per line (e.g. text/x-bar:bar2txt --utf8). Lines
starting with # are treated as comments and ignored.
- -l,
--depth-limit=LIMIT
- set recursion limit (0 = unlimited)
- -f, --follow
- follow symbolic links
- -i,
--ignore-exclusions
- ignore meta robots tags and similar exclusions
- -S,
--spelling
- index data for spelling correction
- -m,
--max-size
- maximum size of file to index (in bytes or with a suffix of 'K'/'k',
'M'/'m', 'G'/'g') (default: unlimited)
- --sample=SOURCE
- what to use for the stored sample of text for HTML documents - SOURCE can
be 'body' or 'description' (default: 'body')
- -E,
--sample-size=SIZE
- maximum size for the document text sample (supports the same formats as
--max-size). (default: 512)
- -T,
--title-size=SIZE
- maximum size for the document title (supports the same formats as
--max-size). (default: 128)
- -R,
--retry-failed
- retry files which omindex failed to extract text from on a previous
run
- --opendir-sleep=SECS
- sleep for SECS seconds before opening each directory - sleeping for 2
seconds seems to reliably work around problems with indexing files on
Microsoft DFS shares.
- -C,
--track-ctime
- track each file's ctime so we can detect changes to ownership or
permissions.
- -v, --verbose
- show more information about what is happening
- --overwrite
- create the database anew (the default is to update if the database already
exists)
- -s,
--stemmer=LANG
- set the stemming language (default: english). Possible values: arabic
armenian basque catalan danish dutch earlyenglish english finnish french
german german2 hungarian indonesian irish italian kraaij_pohlmann
lithuanian lovins nepali norwegian porter portuguese romanian russian
spanish swedish tamil turkish (pass 'none' to disable stemming)
- -h, --help
- display this help and exit
- -V, --version
- output version information and exit
Please report bugs at: https://xapian.org/bugs