snap-aligner_index - scalable nucleotide alignment program
snap-aligner index <input.fa> <output-dir>
[<options>]
Welcome to SNAP version 1.0.0.
- -s
- Seed size (default: 27)
- -h
- Hash table slack (default: 0.3)
- -t
- Specify the maximum number of threads to use. Default is the number of
cores. Do not leave a space after the -t, e.g., -t16
- -B<chars>
- Specify characters to use as chromosome name terminators in the FASTA
header line; these characters and anything after are not part of the
chromosome name. You must specify all characters on a single -B
switch. So, for example, with -B_|, the FASTA header line
'>chr1|Chromosome 1' would generate a chromosome named 'chr1'. There's
a separate flag for indicating that a space is a terminator.
- -bSpace
- Indicates that the space and tab characters are terminators for chromosome
names (see -B above). This may be used in addition to other
terminators specified by -B. -B and -bSpace are case
sensitive.
- -p
- Specify the number of Ns to put as padding between chromosomes. This must
be as large as the largest edit distance you'll ever use, and there's a
performance advantage to have it be bigger than any read you'll process.
Default is 500. Specify the amount of padding directly after -p
without a space.
- -H
- Build a histogram of seed popularity. This is just for information, it's
not used by SNAP. Specify the histogram file name directly after -H
without leaving a space.
- -exact
- Compute hash table sizes exactly. This will slow down index build, but
usually will result in smaller indices.
- -keysize
- The number of bytes to use for the hash table key. Larger values increase
SNAP's memory footprint, but allow larger seeds. By default it's
autoselected based on the seed size.
- -large
- Build a larger index that's a little faster, particualrly for runs with
quick/inaccurate parameters. Increases index size by about 30%, depending
on the other index parameters and the contents of the reference
genome
- -locationSize
- The size of the genome locations stored in the index. This can be from 4
to 8 bytes. The locations need to be big enough not only to index the
genome, but also to allow some space for representing seeds that occur
multiple times. For the human genome, it will fit with four byte locations
if the seed size is 20 or larger, but needs 5 (or more) for smaller seeds.
Making the location size bigger than necessary will just waste (lots of)
space, so unless you're doing something quite unusual, the right answer is
4 or 5. Default is based on seed size: 4 if it's 20 or greater, 5
otherwise.
- -sm
- Use a temp file to work better in smaller memory. This only helps a
little, but can be the difference if you're close. In particular, this
will generally use less memory than the index will use once it's built, so
if this doesn't work you won't be able to use the index anyway. However,
if you've got sufficient memory to begin with, this option will just slow
down the index build by doing extra, useless IO.
- -AutoAlt-
- Don't automatically mark ALT contigs. Otherwise, any contig whose name
ends in '_alt' (regardless of captialization) or starts with HLA- will be
marked ALT. Others will not.
-maxAltContigSize Specify a size at or below which all
contigs are automatically marked ALT, unless overridden by name using the
args below
- -altContigName
- Specify the (case independent) name of an alt to mark a contig. You can
supply this parameter as often as you'd like
- -altContigFile
- Specify the name of a file with a list of alt contig names, one per line.
You may specify this as often as you'd like
-nonAltContigName Specify the name of a contig that's
not an alt, regardless of its size
-nonAltContigFile Specify the name of a file that
contains a list of contigs (one per line) that will not be marked ALT
regardless of size