segemehl - Heuristic mapping of short sequences
segemehl [-besVOc] -d <file> [<file>]
[-q <file>] [-p <file>] [-i <file>] [-j <file>] [-x
<file>] [-y <file>] [-G <file>] [-g <string>] [-t
<n>] [-o <string>] [-u <file>] [-B <string>] [-F
<n>] [-S [<basename>]] [-A <n>] [-D <n>] [-E
<double>] [-H] [-m <n>] [-Z <n>] [-W <n>] [-U
<n>] [-l <f>] [-w <double>] [-X <n>] [-J <n>]
[-I <n>] [-M <n>] [-n <n>] [-r <n>] [--skipidcheck]
[--showalign] [--nohead]
Segemehl is a software to map short sequencer reads to reference
genomes. Segemehl implements a matching strategy based on enhanced suffix
arrays (ESA). Segemehl accepts fasta and fastq queries
(gzip’ed and bgzip'ed). In addition to the alignment
of reads from standard DNA- and RNA-seq protocols, it also allows the
mapping of bisulfite converted reads (Lister and Cokus) and implements a
split read mapping strategy. The output of segemehl is a SAM or BAM
formatted alignment file. In the case of split-read mapping, additional BED
files are written to the disc. These BED files may be summarized with the
postprocessing tool haarz. In the case of the alignment of bisulfite
converted reads, raw methylation rates may also be called with haarz.
In brief, for each suffix of a read, segemehl aims to find the
best-scoring seed. Seeds might contain insertions, deletions, and mismatches
(differences). The number of differences allowed within a single seed is
user-controlled and is crucial for the runtime of the program. Subsequently,
seeds that undercut the user-defined E-value are passed on to an exact
semi-global alignment procedure. Finally, reads with a minimum accuracy of
percent are reported to the user.
- -d, --database
<file> [<file>]
- list of path/filename(s) of fasta database sequence(s)
- -q, --query
<file>
- path/filename of query sequences (default:none)
- -p, --mate
<file>
- path/filename of mate pair sequences (default:none)
- -i, --index
<file>
- path/filename of db index (default:none)
- -j, --index2
<file>
- path/filename of second db index (default:none)
- -x, --generate
<file>
- generate db index and store to disk (default:none)
- -y, --generate2
<file>
- generate second db index and store to disk (default:none)
- -G, --readgroupfile
<file>
- filename to read @RG header (default:none)
- -g, --readgroupid
<string>
- read group id (default:none)
- -t, --threads
<n>
- start <n> threads (default:1)
- -F, --bisulfite
<n>
- bisulfite aln with methylC-seq/Lister et al. (=1) or bs-seq/Cokus et al.
protocol (=2) (default:0)
- -S, --splits
[<basename>]
- detect split/spliced reads. (default:none)
- -A, --accuracy
<n>
- min percentage of matches per read in semi-global alignment
(default:90)
- -D, --differences
<n>
- search seeds initially with <n> differences (default:1)
- -E, --evalue
<double>
- max evalue (default:5.000000)
- -H,
--hitstrategy
- report only best scoring hits (=1) or all (=0) (default:1)
- -m, --minsize
<n>
- minimum length of queries (default:12)
- -Z, --minfraglen
<n>
- min length of a spliced fragment (default:20)
- -W, --minsplicecover
<n>
- min coverage for spliced transcripts (default:80)
- -U, --minfragscore
<n>
- min score of a spliced fragment (default:18)
- -l, --splicescorescale
<f>
- report spliced alignment with score s only if <f>*s is larger than
next best spliced alignment (default:0.900000)
- -w, --maxsplitevalue
<double>
- max evalue for splits (default:50.000000)
- -X, --dropoff
<n>
- dropoff parameter for extension (default:8)
- -J, --jump
<n>
- search seeds with jump size <n> (0=automatic) (default:0)
- -O, --order
- sorts the output by chromsome and position (might take a while!)
- -I,
--maxpairinsertsize <n>
- maximum size of the inserts (paired end) in case of multiple hits
(default:200000)
- -M, --maxinterval
<n>
- maximum width of a suffix array interval, i.e. a query seed will be
omitted if it matches more than <n> times (default:100)
- -c,
--checkidx
- check index
- -n, --extensionpenalty
<n>
- penalty for a mismatch during extension (default:4)
- -r, --maxout
<n>
- maximum number of alignments that will be reported. If set to zero, all
alignments will be reported (default:0)
- --skipidcheck
- do not check whether the fastq ids of mates / paired ends match. Instead,
the first mate (-q) will be used for output only.
- --showalign
- show alignments
- --nohead
- do not output header
Please report bugs to steve@bioinf.uni-leipzig.de
http://www.bioinf.uni-leipzig.de/Software/segemehl/
- 2008 Bioinformatik Leipzig
- 2018 Leibniz Institute on Aging (FLI)
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.