sumtrees - Phylogenetic Tree Summarization and Annotation
sumtrees [-i FORMAT] [-b BURNIN] [--force-rooted]
[--force-unrooted]
SumTrees is a program to summarize non-parameteric bootstrap or
Bayesian posterior probability support for splits or clades on phylogenetic
trees.
The basis of the support assessment is typically given by a set of
non-parametric bootstrap replicate tree samples produced by programs such as
GARLI or RAxML, or by a set of MCMC tree samples produced by programs such
as Mr. Bayes or BEAST. The proportion of trees out of the samples in which a
particular split is found is taken to be the degree of support for that
split as indicated by the samples. The samples that are the basis of the
support can be distributed across multiple files, and a burn-in option
allows for an initial number of trees in each file to be excluded from the
analysis if they are not considered to be drawn from the true support
distribution.
Summarizations collections of trees, e.g., MCMC samples from a
posterior distribution, non-parametric bootstrap replicates, mapping
posterior probability, support, or frequency that splits/clades are found in
the source set of trees onto a target tree.
- TREE-FILEPATH
- Source(s) of trees to summarize. At least one valid source of trees must
be provided. Use '-' to specify reading from standard input (note that
this requires the input file format to be explicitly set using the
'--source-format' option).
- -i FORMAT,
--input-format FORMAT, --source-format FORMAT
- Format of all input trees (defaults to handling either NEXUS or NEWICK
through inspection; it is more efficient to explicitly specify the format
if it is known).
- -b BURNIN, --burnin
BURNIN
- Number of trees to skip from the beginning of *each* tree file when
counting support (default: 0).
- --force-rooted,
--rooted
- Treat source trees as rooted.
- --force-unrooted,
--unrooted
- Treat source trees as unrooted.
- -v,
--ultrametricity-precision, --branch-length-epsilon
- Precision to use when validating ultrametricity (default: 1e-05; specify
'0' to disable validation).
- --weighted-trees
- Use weights of trees (as indicated by '[&W m/n]' comment token) to
weight contribution of splits found on each tree to overall split
frequencies.
- --preserve-underscores
- Do not convert unprotected (unquoted) underscores to spaces when reading
NEXUS/NEWICK format trees.
- --taxon-name-filepath
FILEPATH
- Path to file listing all the taxon names or labels that will be found
across the entire set of source trees. This file should be a plain text
file with a single name list on each line. This file is only read when
multiprocessing ('-M' or '-m') is requested. When multiprocessing using
the '-M' or '-m' options, all taxon names need to be defined in advance of
any actual tree analysis. By default this is done by reading the first
tree in the first tree source and extracting the taxon names. At best,
this is, inefficient, as it involves an extraneous reading of the tree. At
worst, this can be erroneous, if the first tree does not contain all the
taxa. Explicitly providing the taxon names via this option can avoid these
issues.
- -t FILE,
--target-tree-filepath FILE
- Summarize support and other information from the source trees to topology
or topologies given by the tree(s) described in FILE. If no use-specified
target topologies are given, then a summary topology will be used as the
target. Use the '-s' or '--summary-target' to specify the type of summary
tree to use.
- -s SUMMARY-TYPE,
--summary-target SUMMARY-TYPE
- Construct and summarize support and other information from the source
trees to one of the following summary topologies: - 'consensus'
- A consensus tree. The minimum
frequency
- threshold of clades to be included can be specified using the '-f' or
'--min-clade-freq' flags. This is the DEFAULT if a user- specified target
tree is not given through the '-t' or '--target-tree-filepath'
options.
- - 'mcct'
- The maximum clade credibility tree. The tree from the source set that
maximizes the *product* of clade posterior probabilities.
- - 'msct'
- The maximum clade credibility tree. The tree from the source set that
maximizes the *product* of clade posterior probabilities.
- -e STRATEGY,
--set-edges STRATEGY, --edges STRATEGY
- Set the edge lengths of the target or summary trees based on the specified
summarization STRATEGY: - 'mean-length'
- Edge lengths will be set to
the mean of the
- lengths of the corresponding split or clade in the source trees.
- - 'median-length'
- Edge lengths will be set to the median of the
- lengths of the
corresponding split or clade in
- the source trees.
- - 'mean-age'
- Edge lengths will be adjusted so that the age of subtended nodes will be
equal to the mean age of the corresponding split or clade in the source
trees. Source trees will need to to be ultrametric for this option.
- - 'median-age'
- Edge lengths will be adjusted so that the age of subtended nodes will be
equal to the median age of the corresponding split or clade in the source
trees. Source trees will need to to be ultrametric for this option.
- - support
- Edge lengths will be set to the support value for the split represented by
the edge.
- - 'keep'
- Do not change the existing edge lengths. This is the DEFAULT if target
tree(s) are sourced from an external file using the '-t' or
'--targettree-filepath' option
- - 'clear'
- Edge lengths will be cleared from the target trees if they are
present.
- Note the default settings
varies according to the
- following, in order of preference: (1) If target trees are specified using
the '-t' or
- '--target-tree-filepath' option, then the default edge
- summarization strategy is: 'keep'.
- (2) If target trees are not specified, but the
- '--summarize-node-ages' option is specified, then the default edge
summarization strategy is: 'mean-age'.
- (3) If no target trees are specified and the
- node ages are NOT specified to be summarized, then the default edge
summarization strategy is: 'mean-length'.
- --force-minimum-edge-length
FORCE_MINIMUM_EDGE_LENGTH
- (If setting edge lengths) force all edges to be at least this length.
- --collapse-negative-edges
- (If setting edge lengths) force parent node ages to be at least as old as
its oldest child when summarizing node ages.
- --summarize-node-ages,
--ultrametric, --node-ages
- Assume that source trees are ultrametic and summarize node ages (distances
from tips).
- -l {support,keep,clear},
--labels {support,keep,clear}
- Set the node labels of the summary or target tree(s): - 'support'
- Node labels will be set to the
support value for
- the clade represented by the node. This is the DEFAULT.
- - 'keep'
- Do not change the existing node labels.
- - 'clear'
- Node labels will be cleared from the target trees if they are
present.
- --suppress-annotations,
--no-annotations
- Do NOT annotate nodes and edges with any summarization information
metadata such as.support values, edge length and/or node age summary
statistcs, etc.
- -p,
--percentages
- Indicate branch support as percentages (otherwise, will report as
proportions by default).
- -d #, --decimals
#
- Number of decimal places in indication of support values (default:
8).
- -o FILEPATH,
--output-tree-filepath FILEPATH, --output FILEPATH
- Path to output file (if not specified, will print to standard
output).
- -F
{nexus,newick,phylip,nexml}, --output-tree-format
{nexus,newick,phylip,nexml}
- Format of the output tree file (if not specified, defaults to input
format, if this has been explicitly specified, or 'nexus' otherwise).
- -x PREFIX,
--extended-output PREFIX
- If specified, extended summarization information will be generated,
consisting of the following files: -
'<PREFIX>.topologies.trees'
- A collection of topologies found
in the sources
- reported with their associated posterior probabilities as metadata
annotations.
- - '<PREFIX>.bipartitions.trees'
- A collection of bipartitions, each represented as a tree, with associated
information as metadataannotations.
- - '<PREFIX>.bipartitions.tsv'
- Table listing bipartitions as a group pattern as the key column, and
information regarding each the bipartitions as the remaining columns.
- - '<PREFIX>.edge-lengths.tsv'
- List of bipartitions and corresponding edge lengths. Only generated if
edge lengths are summarized.
- - '<PREFIX>.node-ages.tsv'
- List of bipartitions and corresponding ages. Only generated if node ages
are summarized.
- --no-taxa-block
- When writing NEXUS format output, do not include a taxa block in the
output treefile (otherwise will create taxa block by default).
- --no-analysis-metainformation,
--no-meta-comments
- Do not include meta-information describing the summarization parameters
and execution details.
- -c ADDITIONAL_COMMENTS,
--additional-comments ADDITIONAL_COMMENTS
- Additional comments to be added to the summary file.
- -r, --replace
- Replace/overwrite output file without asking if it already exists.
- -h, --help
- Show help information for program and exit.
- --citation
- Show citation information for program and exit.
- --usage-examples
- Show usage examples of program and exit.
- --describe
- Show information regarding your DendroPy and Python installations and
exit.
Jeet Sukumaran and Mark T. Holder
If any stage of your work or analyses relies on code or programs
from this library, either directly or indirectly (e.g., through usage of
your own or third-party programs, pipelines, or toolkits which use, rely on,
incorporate, or are otherwise primarily derivative of code/programs in this
library), please cite:
- Sukumaran, J and MT Holder. 2010. DendroPy: a Python library for
phylogenetic computing. Bioinformatics 26: 1569-1571.
- Sukumaran, J and MT Holder. SumTrees: Phylogenetic Tree Summarization.
4.0.0 (Jan 31 2015). Available at
https://github.com/jeetsukumaran/DendroPy.
Note that, in the interests of scientific reproducibility, you
should describe in the text of your publications not only the specific
version of the SumTrees program, but also the DendroPy library used in your
analysis. For your information, you are running DendroPy 4.0.2.