hmmemit - sample sequences from a profile
hmmemit [options] hmmfile
The hmmemit program samples (emits) sequences from the
profile HMM(s) in hmmfile, and writes them to output. Sampling
sequences may be useful for a variety of purposes, including creating
synthetic true positives for benchmarks or tests.
The default is to sample one unaligned sequence from the core
probability model, which means that each sequence consists of one
full-length domain. Alternatively, with the -c option, you can emit a
simple majority-rule consensus sequence; or with the -a option, you
can emit an alignment (in which case, you probably also want to set
-N to something other than its default of 1 sequence per model).
As another option, with the -p option you can sample a
sequence from a fully configured HMMER search profile. This means sampling a
`homologous sequence' by HMMER's definition, including nonhomologous
flanking sequences, local alignments, and multiple domains per sequence,
depending on the length model and alignment mode chosen for the profile.
The hmmfile may contain a library of HMMs, in which case
each HMM will be used in turn.
hmmfile may be '-' (dash), which means reading this input
from stdin rather than a file.
- -h
- Help; print a brief reminder of command line usage and all available
options.
- -o <f>
- Direct the output sequences to file <f>, rather than to
stdout.
- -N <n>
- Sample <n> sequences per model, rather than just one.
The default is to sample N sequences from the core model.
Alternatively, you may choose one (and only one) of the following
alternatives.
- -a
- Emit an alignment for each HMM in the hmmfile rather than sampling
unaligned sequences one at a time.
- -c
- Emit a plurality-rule consensus sequence, instead of sampling a sequence
from the profile HMM's probability distribution. The consensus sequence is
formed by selecting the maximum probability residue at each match state.
- -C
- Emit a fancier plurality-rule consensus sequence than the -c
option. If the maximum probability residue has p < minl show it
as a lower case 'any' residue (n or x); if p >= minl and <
minu show it as a lower case residue; and if p >= minu
show it as an upper case residue. The default settings of minu and
minl are both 0.0, which means -C gives the same output as
-c unless you also set minu and minl to what you
want.
- -p
- Sample unaligned sequences from the implicit search profile, not from the
core model. The core model consists only of the homologous states (between
the begin and end states of a HMMER Plan7 model). The profile includes the
nonhomologous N, C, and J states, local/glocal and uni/multihit algorithm
configuration, and the target length model. Therefore sequences sampled
from a profile may include nonhomologous as well as homologous sequences,
and may contain more than one homologous sequence segment. By default, the
profile is in multihit local mode, and the target sequence length is
configured for L=400.
These options require that you have set the -p option.
- -L <n>
- Configure the profile's target sequence length model to generate a mean
length of approximately <n> rather than the default of 400.
- --local
- Configure the profile for multihit local alignment.
- --unilocal
- Configure the profile for unihit local alignment (Smith/Waterman).
- --glocal
- Configure the profile for multihit glocal alignment.
- --uniglocal
- Configure the profile for unihit glocal alignment.
These options require that you have set the -C option.
- --minl
<x>
- Sets the minl threshold for showing weakly conserved residues as
lower case. (0 <= x <= 1)
- --minu
<x>
- Sets the minu threshold for showing strongly conserved residues as
upper case. (0 <= x <= 1)
- --seed
<n>
- Seed the random number generator with <n>, an integer >=
0. If <n> is nonzero, any stochastic simulations will be
reproducible; the same command will give the same results. If
<n> is 0, the random number generator is seeded arbitrarily,
and stochastic simulations will vary from run to run of the same command.
The default is 0: use an arbitrary seed, so different hmmemit runs
will generate different samples.
See hmmer(1) for a master man page with a list of all the
individual man pages for programs in the HMMER package.
For complete documentation, see the user guide that came with your
HMMER distribution (Userguide.pdf); or see the HMMER web page
(http://hmmer.org/).
Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.
For additional information on copyright and licensing, see the
file called COPYRIGHT in your HMMER source distribution, or see the HMMER
web page (http://hmmer.org/).