andi - estimates evolutionary distances
andi [OPTIONS...] FILES...
andi estimates the evolutionary distance between closely
related genomes. For this andi reads the input sequences from
FASTA files and computes the pairwise anchor distance. The idea
behind this is explained in a paper by Haubold et al. (2015).
The output is a symmetrical distance matrix in PHYLIP
format, with each entry representing divergence with a positive real number.
A distance of zero means that two sequences are identical, whereas other
values are estimates for the nucleotide substitution rate (Jukes-Cantor
corrected). For technical reasons the comparison might fail and no estimate
can be computed. In such cases nan is printed. This either means that
the input sequences were too short (<200bp) or too diverse (K>0.5) for
our method to work properly.
- -b INT,
--bootstrap=INT
- Compute multiple distance matrices, with n-1 bootstrapped from the
first. See the paper Klötzl & Haubold (2016) for a detailed
explanation.
- --file-of-filenames=FILE
- Usually, andi is called with the filenames as commandline
arguments. With this option the filenames may also be read from a file
itself, with one name per line. Use a single dash ('-') to read
from stdin.
- -j, --join
- Use this mode if each of your FASTA files represents one assembly
with numerous contigs. andi will then treat all of the contained
sequences per file as a single genome. In this mode at least one filename
must be provided via command line arguments. For the output the filename
is used to identify each sequence.
- -l,
--low-memory
- In multithreaded mode, andi requires memory linear to the amount of
threads. The low memory mode changes this to a constant demand independent
from the used number of threads. Unfortunately, this comes at a
significant runtime cost.
- -m MODEL,
--model=MODEL
- Set the nucleotide evolution model to one of 'Raw', 'JC', 'Kimura', or
'LogDet'. By default the Jukes-Cantor correction is used.
- -p FLOAT
- Significance of an anchor; default: 0.025.
- --progress[=WHEN]
- Print a progress bar. WHEN can be 'auto' (default if omitted),
'always', or 'never'.
- -t INT,
--threads=INT
- The number of threads to be used; by default, all available processors are
used.
Multithreading is only available if andi was compiled with OpenMP
support.
- --truncate-names
- By default andi outputs the full names of sequences, optionally
padded with spaces, if they are shorter than ten characters. Names longer
than ten characters may lead to problems with downstream tools. With this
switch names will be truncated.
- -v, --verbose
- Prints additional information, including the amount of found homology.
Apply multiple times for extra verboseness.
- -h, --help
- Prints the synopsis and an explanation of available options.
- --version
- Outputs version information and acknowledgments.
Copyright © 2014 - 2021 Fabian Klötzl License
GPLv3+: GNU GPL version 3 or later.
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law. The full license text is available
at <http://gnu.org/licenses/gpl.html>.
1) andi: Haubold, B. Klötzl, F. and Pfaffelhuber, P.
(2015). andi: Fast and accurate estimation of evolutionary distances between
closely related genomes, Bioinformatics 31.8.
2) Algorithms: Ohlebusch, E. (2013). Bioinformatics Algorithms. Sequence
Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. pp 118f.
3) SA construction: Mori, Y. (2005). libdivsufsort, unpublished.
4) Bootstrapping: Klötzl, F. and Haubold, B. (2016). Support Values for
Genome Phylogenies, Life 6.1.
Please report bugs to <kloetzl@evolbio.mpg.de> or at
<https://github.com/EvolBioInf/andi>.