EXONIPHY(1) | User Commands | EXONIPHY(1) |
exoniphy - Prediction of evolutionarily conserved protein-coding exons using Required argument <msa_fname> must be a multiple alignment file, in one of several possible formats (see --msa-format).
Prediction of evolutionarily conserved protein-coding exons using a phylogenetic hidden Markov model (phylo-HMM). By default, a model definition and model parameters are used that are appropriate for exon prediction in human DNA, based on human/mouse/rat alignments and a 60-state HMM. Using the --hmm, --tree-models, and --catmap options, however, it is possible to define alternative phylo-HMMs, e.g., for different sets of species and different phylogenies, or for prediction of exon pairs or complete gene structures.
--hmm, -H <fname>
--tree-models, -m <fname_list> List of tree model (*.mod) files, one for each state in the HMM. Order of models must correspond to order of states in HMM file. By default, a set of models appropriate for human, mouse, and rat are used (estimated as described in Siepel & Haussler, 2004).
--catmap, -c <fname>|<string>
--extrapolate, -e <phylog.nh> | default Extrapolate to a larger set of species based on the given phylogeny (Newick-format). The trees in the given tree models (*.mod files) must be subtrees of the larger phylogeny. For each tree model M, a copy will be created of the larger phylogeny, then scaled such that the total branch length of the subtree corresponding to M's tree equals the total branch length of M's tree; this new version will then be used in place of M's tree. (Any species name present in this tree but not in the data will be ignored.) If the string "default" is given instead of a filename, then a phylogeny for 25 vertebrate species, estimated from sequence data for Target 1 (CFTR) of the NISC Comparative Sequencing Program (Thomas et al., 2003), will be assumed.
--data-path, -D <path>
--msa-format, -i FASTA|PHYLIP|MPM|MAF|SS
--score, -S Report log-odds scores for predictions, equal to their log total probability under an exon model minus their log total probability under a background model. The exon model can be altered using --cds-types and --signal-types and the background model can be altered using --backgd-types (see below).
--seqname, -s <name>
--idpref, -p <name>
--grouptag, -g <tag> Use specified string as the tag denoting groups in GFF output (default is "transcript_id").
--alias, -A <alias_def>
--no-cns, -x
--reflect-strand, -U
--bias, -b <val>
--sens-spec, -Y <fname-root> Make predictions for a range of different coding biases (see --bias), and write results to files with given filename root. This allows the sensitivity/specificity tradeoff to be examined. The range is fixed at -20 to 10, and 10 different sets of predictions are produced. (Feature types)
--backgd-types, -B <list>
--cds-types, -C <list>
--signal-types, -L <list> (for use with --score) Types of features to be considered "signals" during scoring (default value: "start_codon,stop_codon,5'splice,3'splice,prestart,cds5'ss,cds3'ss"). One score is produced for a CDS feature (as defined by --cds-types) and the adjacent signal features; the score is then assigned to the CDS feature.
--indels, -I
--no-gaps, -W <list> Prohibit gaps in sites of the specified categories (gaps result in emission probabilities of zero). If the default category map is used (see --catmap), then gaps are prohibited in start and stop codons and at the canonical GT and AG positions of splice sites (with or without --indels). In all other cases, the default behavior is to treat gaps as missing data, or to address them with the indel model (--indels).
--require-informative, -N <list>
--not-informative, -n <list>
--quiet, -q
--help -h Print this help message.
REFERENCES: A. Siepel and D. Haussler. 2004. Computational identification of evolutionarily conserved exons. Proc. 8th Annual Int'l Conf.
May 2016 | exoniphy 1.4 |