cmstat - summary statistics for a covariance model file
cmstat [options] <cmfile>
The cmstat utility prints out a tabular file of summary
statistics for each covariance model in <cmfile>.
<cmfile> may be '-' (a dash character), in which case
CMs are read from a <stdin> pipe instead of from a file.
By default, cmstat prints general statistics of the model
and the alignment it was built from, one line per model in a tabular format.
The columns are:
- idx
- The index of this profile, numbering each on in the file starting from 1.
- name
- The name of the profile.
- accession
- The optional accession of the profile, or "-" if there is none.
- nseq
- The number of sequences that the profile was estimated from.
- eff_nseq
- The effective number of sequences that the profile was estimated from,
after Infernal applied an effective sequence number calculation such as
the default entropy weighting.
- clen
- The length of the model in consensus residues (match states).
- W
- The expected maximum length of a hit to the model.
- bps
- The number of basepairs in the model.
- bifs
- The number of bifurcations in the model.
- model
- What type of model will be used by default in cmsearch and
cmscan for this profile, either "cm" or "hmm".
For profiles with 0 basepairs, this will be "hmm" (unless the
--nohmmonly option is used). For all other profiles, this will be
"cm".
- rel entropy, cm:
- Mean relative entropy per match state, in bits. This is the expected
(mean) score per consensus position. This is what the default
entropy-weighting method for effective sequence number estimation focuses
on, so for default Infernal, this value will often reflect the default
target for entropy-weighting. If the "model" field for this
profile is "hmm", this field will be "-".
- rel entropy,
hmm:
- Mean relative entropy per match state, in bits, if the CM were transformed
into an HMM (information from structure is ignored). The larger the
difference between the CM and HMM relative entropy, the more the model
will rely on structural conservation relative sequence conservation when
identifying homologs.
If the model(s) in <cmfile> have been calibrated with
cmcalibrate the -E, -T, and -Z <n>
options can be used to invoke an alternative output mode, reporting E-values
and corresponding bit scores for a specified database size of
<n> megabases (Mb). If the model(s) have been calibrated and
include Rfam GA, TC, and/or NC bit score thresholds the --cut_ga,
--cut_tc, and/or --cut_nc options can be used to display
E-values that correspond to the bit score thresholds. Separate bit scores or
E-values will be displayed for each of the four possible CM search algorithm
and model configuration pairs: local Inside, local CYK, glocal Inside and
glocal CYK.
For profiles with zero basepairs (those with "hmm" in
the "model" field), any E-value and bit score statistics will
pertain to the profile HMM filter, instead of to the CM. This is also true
for all profiles if the --hmmonly option is used.
- -h
- Help; print a brief reminder of command line usage and all available
options.
- -E <x1>
- Report bit scores that correspond to an E-value of <x1> in a
database of <x> megabases (Mb), where <x> is 10
by default but settable with the -Z <x> option.
- -T <x1>
- Report E-values that correspond to a bit score of <x1> in a
database of <x> megabases (Mb), where <x> is 10
by default but settable with the -Z <x> option.
- -Z <x>
- With the -E, -T, --cut_ga, --cut_nc, and
--cut_tc options, calculate E-values as if the target database size
was <x> megabases (Mb). By default, <x> is 10.
- --cut_ga
- Report E-values that correspond to the GA (Rfam gathering threshold) bit
score in a database of <x> megabases (Mb), where
<x> is 10 by default but settable with the -Z
<x> option.
- --cut_tc
- Report E-values that correspond to the TC (Rfam trusted cutoff) bit score
in a database of <x> megabases (Mb), where <x>
is 10 by default but settable with the -Z <x> option.
- --cut_nc
- Report E-values that correspond to the NC (Rfam noise cutoff) bit score in
a database of <x> megabases (Mb), where <x> is
10 by default but settable with the -Z <x> option.
- --key
<s>
- Only print statistics for CM with name or accession <s>, skip
all other models in <cmfile>.
- --hmmonly
- Print statistics on the profile HMM filters for all profiles, instead of
the CMs. This can be useful if you plan to use the --hmmonly option
to cmsearch or cmscan.
- --nohmmonly
- Always print statistics on the CM for each profile, even for those with
zero basepairs.
See infernal(1) for a master man page with a list of all
the individual man pages for programs in the Infernal package.
For complete documentation, see the user guide that came with your
Infernal distribution (Userguide.pdf); or see the Infernal web page ().
Copyright (C) 2016 Howard Hughes Medical Institute.
Freely distributed under a BSD open source license.
For additional information on copyright and licensing, see the
file called COPYRIGHT in your Infernal source distribution, or see the
Infernal web page ().
The Eddy/Rivas Laboratory
Janelia Farm Research Campus
19700 Helix Drive
Ashburn VA 20147 USA
http://eddylab.org