hmm2pfam [options] hmmfile seqfile
hmm2pfam reads a sequence file seqfile and compares
each sequence in it, one at a time, against all the HMMs in hmmfile
looking for significantly similar sequence matches.
hmmfile will be looked for first in the current working
directory, then in a directory named by the environment variable
HMMERDB. This lets administrators install HMM library(s) such as Pfam
in a common location.
There is a separate output report for each sequence in
seqfile. This report consists of three sections: a ranked list of the
best scoring HMMs, a list of the best scoring domains in order of their
occurrence in the sequence, and alignments for all the best scoring domains.
A sequence score may be higher than a domain score for the same sequence if
there is more than one domain in the sequence; the sequence score takes into
account all the domains. All sequences scoring above the -E and
-T cutoffs are shown in the first list, then every domain
found in this list is shown in the second list of domain hits. If desired,
E-value and bit score thresholds may also be applied to the domain list
using the --domE and --domT options.
- -h
- Print brief help; includes version number and summary of all options,
including expert options.
- -n
- Specify that models and sequence are nucleic acid, not protein. Other
HMMER programs autodetect this; but because of the order in which
hmm2pfam accesses data, it can't reliably determine the correct
"alphabet" by itself.
- -A <n>
- Limits the alignment output to the <n> best scoring domains.
-A0 shuts off the alignment output and can be used to reduce the
size of output files.
- -E <x>
- Set the E-value cutoff for the per-sequence ranked hit list to
<x>, where <x> is a positive real number. The
default is 10.0. Hits with E-values better than (less than) this threshold
will be shown.
- -T <x>
- Set the bit score cutoff for the per-sequence ranked hit list to
<x>, where <x> is a real number. The default is
negative infinity; by default, the threshold is controlled by E-value and
not by bit score. Hits with bit scores better than (greater than) this
threshold will be shown.
- -Z <n>
- Calculate the E-value scores as if we had seen a sequence database of
<n> sequences. The default is arbitrarily set to 59021, the
size of Swissprot 34.
- --acc
- Report HMM accessions instead of names in the output reports. Useful for
high-throughput annotation, where the data are being parsed for storage in
a relational database.
- --compat
- Use the output format of HMMER 2.1.1, the 1998-2001 public release;
provided so 2.1.1 parsers don't have to be rewritten.
- --cpu
<n>
- Sets the maximum number of CPUs that the program will run on. The default
is to use all CPUs in the machine. Overrides the HMMER_NCPU environment
variable. Only affects threaded versions of HMMER (the default on most
systems).
- --cut_ga
- Use Pfam GA (gathering threshold) score cutoffs. Equivalent to --globT
<GA1> --domT <GA2>, but the GA1 and GA2 cutoffs are read from
each HMM in hmmfile individually. hmm2build puts these cutoffs
there if the alignment file was annotated in a Pfam-friendly alignment
format (extended SELEX or Stockholm format) and the optional GA annotation
line was present. If these cutoffs are not set in the HMM file,
--cut_ga doesn't work.
- --cut_tc
- Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to --globT
<TC1> --domT <TC2>, but the TC1 and TC2 cutoffs are read from
each HMM in hmmfile individually. hmm2build puts these cutoffs
there if the alignment file was annotated in a Pfam-friendly alignment
format (extended SELEX or Stockholm format) and the optional TC annotation
line was present. If these cutoffs are not set in the HMM file,
--cut_tc doesn't work.
- --cut_nc
- Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT
<NC1> --domT <NC2>, but the NC1 and NC2 cutoffs are read from
each HMM in hmmfile individually. hmm2build puts these cutoffs
there if the alignment file was annotated in a Pfam-friendly alignment
format (extended SELEX or Stockholm format) and the optional NC annotation
line was present. If these cutoffs are not set in the HMM file,
--cut_nc doesn't work.
- --domE
<x>
- Set the E-value cutoff for the per-domain ranked hit list to
<x>, where <x> is a positive real number. The
default is infinity; by default, all domains in the sequences that passed
the first threshold will be reported in the second list, so that the
number of domains reported in the per-sequence list is consistent with the
number that appear in the per-domain list.
- --domT
<x>
- Set the bit score cutoff for the per-domain ranked hit list to
<x>, where <x> is a real number. The default is
negative infinity; by default, all domains in the sequences that passed
the first threshold will be reported in the second list, so that the
number of domains reported in the per-sequence list is consistent with the
number that appear in the per-domain list. Important note: only one
domain in a sequence is absolutely controlled by this parameter, or by
--domT. The second and subsequent domains in a sequence have a de
facto bit score threshold of 0 because of the details of how HMMER works.
HMMER requires at least one pass through the main model per sequence; to
do more than one pass (more than one domain) the multidomain alignment
must have a better score than the single domain alignment, and hence the
extra domains must contribute positive score. See the Users' Guide for
more detail.
- --forward
- Use the Forward algorithm instead of the Viterbi algorithm to determine
the per-sequence scores. Per-domain scores are still determined by the
Viterbi algorithm. Some have argued that Forward is a more sensitive
algorithm for detecting remote sequence homologues; my experiments with
HMMER have not confirmed this, however.
- --informat
<s>
- Assert that the input seqfile is in format <s>; do not
run Babelfish format autodection. This increases the reliability of the
program somewhat, because the Babelfish can make mistakes; particularly
recommended for unattended, high-throughput runs of HMMER. Valid format
strings include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF,
CLUSTAL, and PHYLIP. See the User's Guide for a complete list.
- --null2
- Turn off the post hoc second null model. By default, each alignment is
rescored by a postprocessing step that takes into account possible biased
composition in either the HMM or the target sequence. This is almost
essential in database searches, especially with local alignment models.
There is a very small chance that this postprocessing might remove real
matches, and in these cases --null2 may improve sensitivity at the
expense of reducing specificity by letting biased composition hits
through.
- --pvm
- Run on a Parallel Virtual Machine (PVM). The PVM must already be running.
The client program hmm2pfam-pvm must be installed on all the PVM
nodes. The HMM database hmmfile and an associated GSI index file
hmmfile.gsi must also be installed on all the PVM nodes. (The GSI
index is produced by the program hmm2index.) Because the PVM
implementation is I/O bound, it is highly recommended that each node have
a local copy of hmmfile rather than NFS mounting a shared copy.
Optional PVM support must have been compiled into HMMER for --pvm
to function.
- --xnu
- Turn on XNU filtering of target protein sequences. Has no effect on
nucleic acid sequences. In trial experiments, --xnu appears to
perform less well than the default post hoc null2 model.
Master man page, with full list of and guide to the individual man
pages: see hmmer2(1).
For complete documentation, see the user guide
(ftp://selab.janelia.org/pub/software/hmmer/2.3.2/Userguide.pdf); or see the
HMMER web page, http://hmmer.janelia.org/.
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
Freely distributed under the GNU General Public License (GPL).
See the file COPYING in your distribution for details on
redistribution conditions.
Sean Eddy
HHMI/Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA
http://www.genetics.wustl.edu/eddy/