makehmmerdb - build nhmmer database from a sequence file
makehmmerdb [options] seqfile
binaryfile
makehmmerdb is used to create a binary file from a DNA
sequence file. This binary file may be used as a target database for the DNA
search tool nhmmer. Using default settings in nhmmer, this
yields a roughly 10-fold acceleration with small loss of sensitivity on
benchmarks.
- -h
- Help; print a brief reminder of command line usage and all available
options.
- --informat
<s>
- Assert that input seqfile is in format <s>, bypassing
format autodetection. Common choices for <s> include:
fasta, embl, genbank. Alignment formats also work;
common choices include: stockholm, a2m, afa,
psiblast, clustal, phylip. For more information, and
for codes for some less common formats, see main documentation. The string
<s> is case-insensitive (fasta or FASTA both
work).
- --bin_length
<n>
- Bin length. The binary file depends on a data structure called the FM
index, which organizes a permuted copy of the sequence in bins of length
<n>. Longer bin length will lead to smaller files (because
data is captured about each bin) and possibly slower query time. The
default is 256. Much more than 512 may lead to notable reduction in speed.
- --sa_freq
<n>
- Suffix array sample rate. The FM index structure also samples from the
underlying suffix array for the sequence database. More frequent sampling
(smaller value for <n>) will yield larger file size and
faster search (until file size becomes large enough to cause I/O to be a
bottleneck). The default value is 8. Must be a power of 2.
- --block_size
<n>
- The input sequence is broken into blocks of size <n> million
letters. An FM index is built for each block, rather than building an FM
index for the entire sequence database. Default is 50. Larger blocks do
not seem to yield substantial speed increase.
See hmmer(1) for a master man page with a list of all the
individual man pages for programs in the HMMER package.
For complete documentation, see the user guide that came with your
HMMER distribution (Userguide.pdf); or see the HMMER web page
(http://hmmer.org/).
Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.
For additional information on copyright and licensing, see the
file called COPYRIGHT in your HMMER source distribution, or see the HMMER
web page (http://hmmer.org/).