MSA_VIEW(1) | User Commands | MSA_VIEW(1) |
msa_view - Provides various kinds of "views" of one or more multiple
Provides various kinds of "views" of one or more multiple alignments. Can extract a sub-alignment from an alignment (by row or by column) or combine several alignments into one. Also can extract the sufficient statistics for phylogenetic analysis from an alignment, optionally accounting for site categories that are defined by an auxiliary annotations file. Supports various other functions, including gap stripping, column randomization, and reordering of sequences. Capable of reading and writing in a few common formats. Can be used for file conversion (by default, output is the entire input alignment).
(See below for more details on options)
1. Convert alignment formats (default input and output is FASTA)
2. Obtain a sub-alignment by position, using the coordinate frame of the first sequence in the alignment.
3. Obtain a sub-alignment by sequence.
(can also specify sequences by name, e.g., --seqs cow,rat,pig)
4. Concatenate alignments.
(source alignments may have different subsets of sequences and may use different sequence orders; here, human,mouse,rat defines full set and order in output alignment)
5. Extract sufficient statistics from a FASTA file.
6. Extract sufficient statistics from a MAF file for a complete human chromosome. (Can be used by phyloFit.)
7. As in (6), but include information about regions of the reference sequence not present in the MAF file, and include a representation of the order in which alignment columns occur (needed by programs such as phastCons or exoniphy).
8. As in (6), but collect statistics for pairs of adjacent sites (can be used by phyloFit to estimate a dinucleotide model).
9. Pool sufficient statistics from several human chromosomes.
10. Extract separate sufficient statistics for the three codon positions, as defined by annotations in a GFF file.
11. As in (10), but re-orient genes on - strand so that stats reflect + strand. Assume genes are defined by tag "transcript_id".
--start, -s <start_col>
--end, -e <end_col>
--seqs, -l <seq_list> Comma-separated list of sequences to include (default) exclude (if --exclude). Indicate by sequence number or name (numbering starts with 1 and is evaluated *after* --order is applied).
--exclude, -x
--refidx, -r <ref_seq>
--aggregate, -A <name_list>
--split-all, -X <filename root>
--in-format, -i PHYLIP|FASTA|MPM|MAF|SS
--out-format, -o PHYLIP|FASTA|MPM|SS (Default FASTA) Output file format.
--alphabet, -a <alphabet_string>
--soft-masked, -f
--unmask, -u
--pretty, -P Pretty-print alignment (use '.' when character matches corresponding character in first sequence). Ignored if --out-format SS is selected.
--gap-strip, -G ALL|ANY|<s> Strip columns containing all gaps, any gaps, or a gap in the specified sequence (<s>). Indexing starts at one and refers to the list *after* any sequences have been added or subtracted (via --seqs and --exclude or --order).
--collapse-missing, -p
--mark-missing, -K <maxlen> Convert all gaps of length greater than <maxlen> to "*" characters. If --refidx is specified (with a positive index), gaps in the designated reference sequence will not be altered. This is a useful heuristic for distinguishing between microindels and regions of missing data (e.g., due to large-scale indels, incomplete assemblies, or highly diverged sequences).
--missing-as-indels, -m
--order, -O <name_list> Change order of rows in alignment to match sequence names specified in name_list. If a name appears in name_list but not in the alignment, a row of gaps will be inserted. This option is applied to the alignment *before* --seqs, --refidx, and --gap-strip are applied.
--reverse-complement, -V
--randomize, -R Randomly permute the columns of the source alignment (done *before* taking sub-alignment). Requires an ordered representation of the alignment (careful using with --in-format SS|MAF -- will create full alignment from sufficient statistics).
--fill-Ns, -N <s:b-e>
--summary-only -S Report only summary statistics, rather than complete alignment. Statistics are for alignment that would otherwise be output (i.e., after other options have been applied).
--window-summary, -w <win_size> Like -S, but output summary statistics for non-overlapping windows of the specified size. (Sufficient statistics)
--tuple-size, -T <tup_size> (For use with --out-format SS). Represent an alignment in terms of tuples of columns of the designated size. Useful
--unordered-ss, -z
--refseq, -M <fname>
--keep-overlapping, -k
--cats-cycle. (Site categories: all options require --out-format SS)
--features, -g <gff_fname>
--catmap, -c <fname>|<string>
--cats-cycle, -Y <cycle_size> (alternative to --features and --catmap) Assign site categories in cycles of the specified size, e.g., as 1,2,3,...,1,2,3 (for cycle_size == 3). Useful for in-frame coding sequence, or to partition a data set into nonoverlapping tuples of columns (use with --do-cats).
--do-cats, -C <cat_list>
--codons, -D Extract sufficient statistics for in-frame codons. Implies --tuple-size 3 --cats-cycle 3 --do-cats 3. Not appropriate
--reverse-groups, -W <tag>
--4d, -4
--clean-coding, -L <seqname>
--clean-indels, -I <nseqs>
--help, -h Print this help message.
May 2016 | msa_view 1.4 |