bbnorm.sh - Kmer-based error-correction and normalization tool
bbnorm.sh in=<input> out=<reads to keep>
outt=<reads to toss> hist=<histogram output>
Normalizes read depth based on kmer counts. Can also
error-correct, bin reads by kmer depth, and generate a kmer depth histogram.
However, Tadpole has superior error-correction to BBNorm. Please read
bbmap/docs/guides/BBNormGuide.txt for more information.
- in=null
- Primary input. Use in2 for paired reads in a second file
- in2=null
- Second input file for paired reads in two files
- Additional files to use for input (generating hash table) but not for
output
- fastareadlen=2^31
- Break up FASTA reads longer than this. Can be useful when processing
scaffolded genomes
- tablereads=-1
- Use at most this many reads when building the hashtable (-1 means
all)
- kmersample=1
- Process every nth kmer, and skip the rest
- readsample=1
- Process every nth read, and skip the rest
- interleaved=auto
- May be set to true or false to force the input read file to ovverride
autodetection of the input file as paired interleaved.
- qin=auto
- ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or
auto.
- out=<file>
- File for normalized or corrected reads. Use out2 for paired reads in a
second file
- outt=<file>
- (outtoss) File for reads that were excluded from primary output
- reads=-1
- Only process this number of reads, then quit (-1 means all)
- sampleoutput=t
- Use sampling on output as well as input (not used if sample rates are
1)
- keepall=f
- Set to true to keep all reads (e.g. if you just want error
correction).
- zerobin=f
- Set to true if you want kmers with a count of 0 to go in the 0 bin instead
of the 1 bin in histograms.
- Default is false, to prevent confusion about how there can be 0-count
kmers. The reason is that based on the 'minq' and 'minprob' settings, some
kmers may be excluded from the bloom filter.
- tmpdir=
- This will specify a directory for temp files (only needed for multipass
runs). If null, they will be written to the output directory.
- usetempdir=t
- Allows enabling/disabling of temporary directory; if disabled, temp files
will be written to the output directory.
- qout=auto
- ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or
auto (same as input).
- rename=f
- Rename reads based on their kmer depth.
- k=31
- Kmer length (values under 32 are most efficient, but arbitrarily high
values are supported)
- bits=32
- Bits per cell in bloom filter; must be 2, 4, 8, 16, or 32. Maximum kmer
depth recorded is 2^cbits. Automatically reduced to 16 in 2-pass.
- Large values decrease accuracy for a fixed amount of memory, so use the
lowest number you can that will still capture highest-depth kmers.
- hashes=3
- Number of times each kmer is hashed and stored. Higher is slower.
- Higher is MORE accurate if there is enough memory, and LESS accurate if
there is not enough memory.
- prefilter=f
- True is slower, but generally more accurate; filters out low-depth kmers
from the main hashtable. The prefilter is more memory-efficient because it
uses 2-bit cells.
- prehashes=2
- Number of hashes for prefilter.
- prefilterbits=2
- (pbits) Bits per cell in prefilter.
- prefiltersize=0.35
- Fraction of memory to allocate to prefilter.
- buildpasses=1
- More passes can sometimes increase accuracy by iteratively removing
low-depth kmers
- minq=6
- Ignore kmers containing bases with quality below this
- minprob=0.5
- Ignore kmers with overall probability of correctness below this
- threads=auto
- (t) Spawn exactly X hashing threads (default is number of logical
processors). Total active threads may exceed X due to I/O threads.
- rdk=t
- (removeduplicatekmers) When true, a kmer's count will only be incremented
once per read pair, even if that kmer occurs more than once.
- fixspikes=f
- (fs) Do a slower, high-precision bloom filter lookup of kmers that appear
to have an abnormally high depth due to collisions.
- target=100
- (tgt) Target normalization depth. NOTE: All depth parameters control kmer
depth, not read depth.
- For kmer depth Dk, read depth Dr, read length R, and kmer size K:
Dr=Dk*(R/(R-K+1))
- maxdepth=-1
- (max) Reads will not be downsampled when below this depth, even if they
are above the target depth.
- mindepth=5
- (min) Kmers with depth below this number will not be included when
calculating the depth of a read.
- minkmers=15
- (mgkpr) Reads must have at least this many kmers over min depth to be
retained. Aka 'mingoodkmersperread'.
- percentile=54.0
- (dp) Read depth is by default inferred from the 54th percentile of kmer
depth, but this may be changed to any number 1-100.
- uselowerdepth=t
- (uld) For pairs, use the depth of the lower read as the depth proxy.
- deterministic=t
- (dr) Generate random numbers deterministically to ensure identical output
between multiple runs. May decrease speed with a huge number of
threads.
- passes=2
- (p) 1 pass is the basic mode. 2 passes (default) allows greater accuracy,
error detection, better contol of output depth.
- hdp=90.0
- (highdepthpercentile) Position in sorted kmer depth array used as proxy of
a read's high kmer depth.
- ldp=25.0
- (lowdepthpercentile) Position in sorted kmer depth array used as proxy of
a read's low kmer depth.
- tossbadreads=f
- (tbr) Throw away reads detected as containing errors.
- requirebothbad=f
- (rbb) Only toss bad pairs if both reads are bad.
- errordetectratio=125
- (edr) Reads with a ratio of at least this much between their high and low
depth kmers will be classified as error reads.
- highthresh=12
- (ht) Threshold for high kmer. A high kmer at this or above are considered
non-error.
- lowthresh=3
- (lt) Threshold for low kmer. Kmers at this and below are always considered
errors.
- ecc=f
- Set to true to correct errors. NOTE: Tadpole is now preferred for ecc as
it does a better job.
- ecclimit=3
- Correct up to this many errors per read. If more are detected, the read
will remain unchanged.
- errorcorrectratio=140
- (ecr) Adjacent kmers with a depth ratio of at least this much between will
be classified as an error.
- echighthresh=22
- (echt) Threshold for high kmer. A kmer at this or above may be considered
non-error.
- eclowthresh=2
- (eclt) Threshold for low kmer. Kmers at this and below are considered
errors.
- eccmaxqual=127
- Do not correct bases with quality above this value.
- aec=f
- (aggressiveErrorCorrection) Sets more aggressive values of ecr=100,
ecclimit=7, echt=16, eclt=3.
- cec=f
- (conservativeErrorCorrection) Sets more conservative values of ecr=180,
ecclimit=2, echt=30, eclt=1, sl=4, pl=4.
- meo=f
- (markErrorsOnly) Marks errors by reducing quality value of suspected
errors; does not correct anything.
- mue=t
- (markUncorrectableErrors) Marks errors only on uncorrectable reads;
requires 'ecc=t'.
- overlap=f
- (ecco) Error correct by read overlap.
- hist=<file>
- Specify a file to write the input kmer depth histogram.
- histout=<file>
- Specify a file to write the output kmer depth histogram.
- histcol=3
- (histogramcolumns) Number of histogram columns, 2 or 3.
- pzc=f
- (printzerocoverage) Print lines in the histogram with zero coverage.
- histlen=1048576
- Max kmer depth displayed in histogram. Also affects statistics displayed,
but does not affect normalization.
- peaks=<file>
- Write the peaks to this file. Default is stdout.
- minHeight=2
- (h) Ignore peaks shorter than this.
- minVolume=5
- (v) Ignore peaks with less area than this.
- minWidth=3
- (w) Ignore peaks narrower than this.
- minPeak=2
- (minp) Ignore peaks with an X-value below this.
- maxPeak=BIG
- (maxp) Ignore peaks with an X-value above this.
- maxPeakCount=8
- (maxpc) Print up to this many peaks (prioritizing height).
- -Xmx
- This will set Java's memory usage, overriding autodetection.
- -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will
specify 200 megs. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception
occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Written by Brian Bushnell (Last modified October 19, 2017)
Please contact Brian Bushnell at bbushnell@lbl.gov if you
encounter any problems, or post at:
http://seqanswers.com/forums/showthread.php?t=41057
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.