PHASTODDS(1) | User Commands | PHASTODDS(1) |
phastOdds - Compute log-odds scores based on two phylogenetic models or phylo-HMMs,
Compute log-odds scores based on two phylogenetic models or phylo-HMMs, one for features of interest (e.g., coding exons, conserved regions) and one for background. Will either (1) compute a score for each feature in an input set, and output the same set of features with scores; or (2) output a separate score for each position in fixed-step WIG format (http://genome.ucsc.edu/goldenPath/help/wiggle.html); or (3) compute scores in a sliding window of designated size, and output a three-column file, with the index of the center of each window followed by the score for that window on the positive strand, then the score for that window on the negative strand. The default is to assume a reference sequence alignment, with the reference sequence appearing first; feature coordinates are assumed to be defined with respect to the reference sequence (see --refidx).
phastOdds [OPTIONS] --background-mods <bmods> [--background-hmm <bhmm>] --feature-mods <fmods> [--feature-hmm <fhmm>] ( --features <feats> | --window <size> ) <alignment>
Arguments <bmods> and <fmods> should be comma-delimited lists of phylogenetic models in .mod format (as produced by phyloFit), <feats> may be in GFF, BED, or genepred format, and <alignment> may be in FASTA format or an alternative format specified by --msa-format. HMM files should be in the format used by exoniphy.
(See below for more details on options)
1. Compute conservation scores for features in a GFF file, based on a
model for conserved sites (conserved.mod) vs. a model of neutral evolution (neutral.mod). (These models may be estimated with phyloFit or phastCons.)
Features could alternatively be specified in BED or genepred format (format will be auto-recognized). The program can be made to produce BED-formatted output with --output-bed.
2. Compute conservation scores in a sliding window of size 100.
(Window is advanced one site at a time. Window boundaries are defined in the coordinate frame of the multiple alignment, but center coordinates are converted to the frame of the reference sequence as they are output.)
3. Compute a "coding potential" score for features in a BED file, based on a phylo-HMM for coding regions versus a phylo-HMM for noncoding DNA, with states for conserved and nonconserved sequences.
--background-mods, -b <backgd_mods>
--background-hmm, -B <backgd.hmm>
--feature-mods, -f <feat_mods> (Required) Comma-delimited list of tree model (*.mod) files for features. If used with --feature-hmm, order of models must correspond to order of states in HMM.
--feature-hmm, -F <feat.hmm> HMM for features. If there is only one tree model for features, a trivial (single-state) HMM will be assumed.
--features, -g <feats.gff>
--window, -w <size> (Can be used instead of -g or -y) Compute scores in a sliding window of the specified size.
--base-by-base, -y
--window-wig, -W <size>
--msa-format, -i <type>
--refidx, -r <ref_seq> Index of reference sequence for coordinates. Use 0 to indicate the coordinate system of the alignment as a whole. Default is 1, for first sequence.
--output-bed, -d
--verbose, -v Verbose mode. Print messages to stderr describing what the program is doing.
--help, -h
May 2016 | phastOdds 1.4 |