DOKK / manpages / debian 11 / phast / phastOdds.1.en
PHASTODDS(1) User Commands PHASTODDS(1)

phastOdds - Compute log-odds scores based on two phylogenetic models or phylo-HMMs,

Compute log-odds scores based on two phylogenetic models or phylo-HMMs, one for features of interest (e.g., coding exons, conserved regions) and one for background. Will either (1) compute a score for each feature in an input set, and output the same set of features with scores; or (2) output a separate score for each position in fixed-step WIG format (http://genome.ucsc.edu/goldenPath/help/wiggle.html); or (3) compute scores in a sliding window of designated size, and output a three-column file, with the index of the center of each window followed by the score for that window on the positive strand, then the score for that window on the negative strand. The default is to assume a reference sequence alignment, with the reference sequence appearing first; feature coordinates are assumed to be defined with respect to the reference sequence (see --refidx).

phastOdds [OPTIONS] --background-mods <bmods> [--background-hmm <bhmm>] --feature-mods <fmods> [--feature-hmm <fhmm>] ( --features <feats> | --window <size> ) <alignment>

Arguments <bmods> and <fmods> should be comma-delimited lists of phylogenetic models in .mod format (as produced by phyloFit), <feats> may be in GFF, BED, or genepred format, and <alignment> may be in FASTA format or an alternative format specified by --msa-format. HMM files should be in the format used by exoniphy.

(See below for more details on options)

1. Compute conservation scores for features in a GFF file, based on a

model for conserved sites (conserved.mod) vs. a model of neutral evolution (neutral.mod). (These models may be estimated with phyloFit or phastCons.)

phastOdds --background-mods neutral.mod --feature-mods conserved.mod --features features.gff alignment.fa > scores.gff

Features could alternatively be specified in BED or genepred format (format will be auto-recognized). The program can be made to produce BED-formatted output with --output-bed.

2. Compute conservation scores in a sliding window of size 100.

phastOdds --background-mods neutral.mod --feature-mods conserved.mod --window 100 alignment.fa > scores.dat

(Window is advanced one site at a time. Window boundaries are defined in the coordinate frame of the multiple alignment, but center coordinates are converted to the frame of the reference sequence as they are output.)

3. Compute a "coding potential" score for features in a BED file, based on a phylo-HMM for coding regions versus a phylo-HMM for noncoding DNA, with states for conserved and nonconserved sequences.

phastOdds --background-mods codon1.mod,codon2.mod,codon3.mod --background-hmm coding.hmm --feature-mods neutral.mod,conserved-noncoding.mod --feature-hmm noncoding.hmm --features features.bed --output-bed alignment.fa > scores.bed

--background-mods, -b <backgd_mods>

(Required) Comma-delimited list of tree model (*.mod) files for background. If used with --background-hmm, order of models must correspond to order of states in HMM.

--background-hmm, -B <backgd.hmm>

If there is only one backgound tree
model, a trivial (single-state) HMM will be assumed.

--feature-mods, -f <feat_mods> (Required) Comma-delimited list of tree model (*.mod) files for features. If used with --feature-hmm, order of models must correspond to order of states in HMM.

--feature-hmm, -F <feat.hmm> HMM for features. If there is only one tree model for features, a trivial (single-state) HMM will be assumed.

--features, -g <feats.gff>

(Required unless -w or -y) File defining features to be scored (GFF, BED, or genepred).

--window, -w <size> (Can be used instead of -g or -y) Compute scores in a sliding window of the specified size.

--base-by-base, -y

(Can be used instead of -g or -y) Output base-by-base scores, in the coordinate frame of the reference sequence (or of the sequence specified by --refidx). Output is in fixed-step WIG format (http://genome.ucsc.edu/goldenPath/help/wiggle.html). This option can only be used with individual phylogenetic models, not with sets of models and a (nontrivial) HMM.

--window-wig, -W <size>

(Can be used instead of -g or -y) Like --window but outputs scores in fixed-step WIG format, as with --base-by-base. Scores for the positive strand only are output.

--msa-format, -i <type>

May be FASTA, PHYLIP, MPM, SS, or
MAF (default is to guess format from file contents).

--refidx, -r <ref_seq> Index of reference sequence for coordinates. Use 0 to indicate the coordinate system of the alignment as a whole. Default is 1, for first sequence.

--output-bed, -d

(For use with -g) Generate output in bed format rather than GFF.

--verbose, -v Verbose mode. Print messages to stderr describing what the program is doing.

--help, -h

Print this help message.
May 2016 phastOdds 1.4