sortmerna - tool for filtering, mapping and OTU-picking NGS
reads
sortmerna --ref db.fasta,db.idx --reads
file.fa --aligned base_name_output [OPTIONS]
SortMeRNA is a biological sequence analysis tool for filtering,
mapping and OTU-picking NGS reads. The core algorithm is based on
approximate seeds and allows for fast and sensitive analyses of nucleotide
sequences. The main application of SortMeRNA is filtering rRNA from
metatranscriptomic data. Additional applications include OTU-picking and
taxonomy assignation available through QIIME v1.9+ (http://qiime.org -
v1.9.0-rc1).
SortMeRNA takes as input a file of reads (fasta or fastq format)
and one or multiple rRNA database file(s), and sorts apart rRNA and rejected
reads into two files specified by the user. Optionally, it can provide high
quality local alignments of rRNA reads against the rRNA database. SortMeRNA
works with Illumina, 454, Ion Torrent and PacBio data, and can produce SAM
and BLAST-like alignments.
- --ref
STRING,STRING
- FASTA reference file, index file
Example:
--ref /path/to/file1.fasta,/path/to/index1
If passing multiple reference sequence files, separate them by ':'
Example:
--ref
/path/f1.fasta,/path/index1:/path/f2.fasta,path/index2
- --reads
STRING
- FASTA/FASTQ reads file
- --aligned
STRING
- aligned reads filepath + base file name (appropriate extension will be
added)
- --other
STRING
- rejected reads filepath + base file name (appropriate extension will be
added)
- --fastx
BOOL
- output FASTA/FASTQ fil (default: off, for aligned and/or rejected
reads)
- --sam
BOOL
- output SAM alignmen (default: off, for aligned reads only)
- --SQ BOOL
- add SQ tags to the SAM fil (default: off)
- --blast
INT
- output alignments in various Blast-like formats
0 - pairwise
1 - tabular (Blast -m 8 format)
2 - tabular + column for CIGAR
3 - tabular + columns for CIGAR and query coverage
- --log
BOOL
- output overall statistic (default: off)
- --num_alignments
INT
- report first INT alignments per read reaching E-value (default: -1,
--num_alignments 0 signifies all alignments will be output)
- or (default)
- --best
INT
- report INT best alignments per read reaching E-value (default: 1) by
searching --min_lis INT candidate alignments (--best
0 signifies all candidate alignments will be searched)
- --min_lis
INT
- search all alignments having the first INT longest LIS (default: 2) LIS
stands for Longest Increasing Subsequence, it is computed using seeds'
positions to expand hits into longer matches prior to Smith-Waterman
alignment.
- --print_all_reads
- output null alignment strings for non-aligned reads (default: off) to SAM
and/or BLAST tabular files
- --paired_in
BOOL
- both paired-end reads go in --aligned fasta/q file (default: off,
interleaved reads only, see Section 4.2.4 of User Manual)
- --paired_out
BOOL
- both paired-end reads go in --other fasta/q file (default: off,
interleaved reads only, see Section 4.2.4 of User Manual)
- --match
INT
- SW score (positive integer) for a match (default: 2)
- --mismatch
INT
- SW penalty (negative integer) for a mismatch (default: -3)
- --gap_open
INT
- SW penalty (positive integer) for introducing a gap (default: 5)
- --gap_ext
INT
- SW penalty (positive integer) for extending a gap (default: 2)
- -N INT
- SW penalty for ambiguous letters (N's) (default: scored as
--mismatch)
- -F BOOL
- search only the forward strand (default: off)
- -R BOOL
- search only the reverse-complementary strand (default: off)
- -a INT
- number of threads to use (default: 1)
- -e DOUBLE
- E-value threshold (default: 1)
- -m INT
- INT Mbytes for loading the reads into memory (default: 1024, maximum
-m INT is 5872)
- -v BOOL
- verbose (default: off)
- --id DOUBLE
- %id similarity threshold (the alignment must still pass the E-value
threshold, default: 0.97)
- --coverage
DOUBLE
- %query coverage threshold (the alignment must still pass the E-value
threshold, default: 0.97)
- --de_novo_otu
BOOL
- FASTA/FASTQ file for reads matching database < %id
(set using --id) and < %cov (set using --coverage)
(alignment must still pass the E-value threshold, default: off)
- --otu_map
BOOL
- output OTU map (input to QIIME's make_otu_table.py, default: off)
see SortMeRNA user manual for more details
- --passes
INT
- three intervals at which to place the seed on the read (L is the seed
length set in indexdb_rna(1), default: L,L/2,3)
- --edges
INT
- number (or percent if INT followed by % sign) of nucleotides to add to
each edge of the read prior to SW local alignment (default: 4)
- --num_seeds
INT
- number of seeds matched before searching for candidate LIS (default:
2)
- --full_search
BOOL
- search for all 0-error and 1-error seed matches in the index rather than
stopping after finding a 0-error match (<1% gain in sensitivity with up
four-fold decrease in speed, default: off)
- --pid
BOOL
- add pid to output file names (default: off)
- -h BOOL
- help
- --version
BOOL
- SortMeRNA version number