run_tipp.py - an identification and phylogenetic profiling
tool
usage: run_tipp.py [-h] [-v] [-A N] [-P N] [-F N] [--distance
DISTANCE]
- [-M DIAMETER] [-S DECOMP] [-p DIR] [-o OUTPUT] [-d OUTPUT_DIR] [-t TREE]
[-r RAXML] [-a ALIGN] [-f FRAG] [-m MOLECULE] [-x N] [-cp CHCK_FILE] [-cpi
N] [-seed N] [-R N] [-at N] [-D] [-pt N] [-PD N] [-tx TAXONOMY] [-txm
MAPPING] [-adt TREE] [-C N]
This script runs the SEPP algorithm on an input tree, alignment,
fragment file, and RAxML info file. It uses a reference dataset which has to
be downloaded from
https://github.com/tandyw/tipp-reference/releases/download/v2.0.0/tipp.zip
If the local administrator has not set the path to this reference
dataset in /etc/sepp/tipp.config, you should copy this file to ~/.sepp/ and
put the path to the dataset in the reference section of the
configuration file, see tipp.config(5).
- These options determine the alignment decomposition size and taxon
insertion size. If None is given, then the default is to align/place at
10% of total taxa. The alignment decomosition size must be less than the
taxon insertion size.
- -A N, --alignmentSize
N
- max alignment subset size of N [default: 10% of the total number of taxa
or the placement subset size if given]
- -P N, --placementSize
N
- max placement subset size of N [default: 10% of the total number of taxa
or the alignment length (whichever bigger)]
- -F N,
--fragmentChunkSize N
- maximum fragment chunk size of N. Helps controlling memory. [default:
20000]
- --distance
DISTANCE
- minimum p-distance before stopping the decomposition[default: 1]
- -M DIAMETER,
--diameter DIAMETER
- maximum tree diameter before stopping the decomposition[default:
None]
- -S DECOMP,
--decomp_strategy DECOMP
- decomposition strategy [default: using tree branch length]
- These options control input. To run SEPP the following is required. A
backbone tree (in newick format), a RAxML_info file (this is the file
generated by RAxML during estimation of the backbone tree. Pplacer uses
this info file to set model parameters), a backbone alignment file (in
fasta format), and a fasta file including fragments. The input sequences
are assumed to be DNA unless specified otherwise.
- -t TREE, --tree
TREE
- Input tree file (newick format) [default: None]
- -r RAXML, --raxml
RAXML
- RAxML_info file including model parameters, generated by RAxML.[default:
None]
- -a ALIGN, --alignment
ALIGN
- Aligned fasta file [default: None]
- -f FRAG, --fragment
FRAG
- fragment file [default: None]
- -m MOLECULE,
--molecule MOLECULE
- Molecule type of sequences. Can be amino, dna, or rna [default: dna]
- These arguments set settings specific to TIPP
- -R N, --reference_pkg
N
- Use a pre-computed reference package [default: None]
- -at N,
--alignmentThreshold N
- Enough alignment subsets are selected to reach a commulative probability
of N. This should be a number between 0 and 1 [default: 0.95]
- -D, --dist
- Treat fragments as distribution
- -pt N,
--placementThreshold N
- Enough placements are selected to reach a commulative probability of N.
This should be a number between 0 and 1 [default: 0.95]
- -PD N, --push_down
N
- Whether to classify based on children below or above insertion point.
[default: True]
- -tx TAXONOMY,
--taxonomy TAXONOMY
- A file describing the taxonomy. This is a commaseparated text file that
has the following fields: taxon_id,parent_id,taxon_name,rank. If there are
other columns, they are ignored. The first line is also ignored.
- -txm MAPPING,
--taxonomyNameMapping MAPPING
- A comma-separated text file mapping alignment sequence names to taxonomic
ids. Formats (each line): sequence_name,taxon_id. If there are other
columns, they are ignored. The first line is also ignored.
- -adt TREE,
--alignmentDecompositionTree TREE
- A newick tree file used for decomposing taxa into alignment subsets.
[default: the backbone tree]
- -C N, --cutoff
N
- Placement probability requirement to count toward the distribution. This
should be a number between 0 and 1 [default: 0.0]