reformat.sh - Reformats reads between
fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64
reformat.sh in=<file> in2=<file2>
out=<outfile> out2=<outfile2>
Reformats reads to change ASCII quality encoding, interleaving,
file format, or compression format. Optionally performs additional functions
such as quality trimming, subsetting, and subsampling. Supports fastq,
fasta, fasta+qual, scarf, oneline, sam, bam, gzip, bz2. Please read
bbmap/docs/guides/ReformatGuide.txt for more information.
in2 and out2 are for paired reads and are optional. If input is
paired and there is only one output file, it will be written
interleaved.
- ow=f
- (overwrite) Overwrites files that already exist.
- app=f
- (append) Append to files that already exist.
- zl=4
- (ziplevel) Set compression level, 1 (low) to 9 (max).
- int=f
- (interleaved) Determines whether INPUT file is considered
interleaved.
- fastawrap=70
- Length of lines in fasta output.
- fastareadlen=0
- Set to a non-zero number to break fasta files into reads of at most this
length.
- fastaminlen=1
- Ignore fasta reads shorter than this.
- qin=auto
- ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or
auto.
- qout=auto
- ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or
auto (same as input).
- qfake=30
- Quality value used for fasta to fastq reformatting.
- qfin=<.qual
file>
- Read qualities from this qual file, for the reads coming from
'in=<fasta file>'
- qfin2=<.qual
file>
- Read qualities from this qual file, for the reads coming from
'in2=<fasta file>'
- qfout=<.qual
file>
- Write qualities from this qual file, for the reads going to 'out=<fasta
file>'
- qfout2=<.qual
file>
- Write qualities from this qual file, for the reads coming from
'out2=<fasta file>'
- outsingle=<file>
- (outs) If a read is longer than minlength and its mate is shorter, the
longer one goes here.
- deleteinput=f
- Delete input upon successful completion.
- ref=<file>
- Optional reference fasta for sam processing.
- verifypaired=f
- (vpair) When true, checks reads to see if the names look paired. Prints an
error message if not.
- verifyinterleaved=f
- (vint) sets 'vpair' to true and 'interleaved' to true.
- allowidenticalnames=f
- (ain) When verifying pair names, allows identical names, instead of
requiring /1 and /2 or 1: and 2:
- tossbrokenreads=f
- (tbr) Discard reads that have different numbers of bases and qualities. By
default this will be detected and cause a crash.
- ignorebadquality=f
- (ibq) Fix out-of-range quality values instead of crashing with a
warning.
- addslash=f
- Append ' /1' and ' /2' to read names, if not already present. Please
include the flag 'int=t' if the reads are interleaved.
- spaceslash=t
- Put a space before the slash in addslash mode.
- addcolon=f
- Append ' 1:' and ' 2:' to read names, if not already present. Please
include the flag 'int=t' if the reads are interleaved.
- underscore=f
- Change whitespace in read names to underscores.
- rcomp=f
- (rc) Reverse-compliment reads.
- rcompmate=f
- (rcm) Reverse-compliment read 2 only.
- changequality=t
- (cq) N bases always get a quality of 0 and ACGT bases get a min quality of
2.
- quantize=f
- Quantize qualities to a subset of values like NextSeq. Can also be used
with comma-delimited list, like quantize=0,8,13,22,27,32,37
- tuc=f
- (touppercase) Change lowercase letters in reads to uppercase.
- uniquenames=f
- Make duplicate names unique by appending _<number>.
- remap=
- A set of pairs: remap=CTGN will transform C>T and G>N.
- Use remap1 and remap2 to specify read 1 or 2.
- iupacToN=f
- (itn) Convert non-ACGTN symbols to N.
- monitor=f
- Kill this process if it crashes. monitor=600,0.01 would kill after 600
seconds under 1% usage.
- crashjunk=t
- Crash when encountering reads with invalid bases.
- tossjunk=f
- Discard reads with invalid characters as bases.
- fixjunk=f
- Convert invalid bases to N.
- Convert nonstandard header characters to standard ASCII.
- recalibrate=f
- (recal) Recalibrate quality scores. Must first generate matrices with
CalcTrueQuality.
- maxcalledquality=41
- Quality scores capped at this upper bound.
- mincalledquality=2
- Quality scores of ACGT bases will be capped at lower bound.
- trimreaddescription=f
- (trd) Trim the names of reads after the first whitespace.
- trimrname=f
- For sam/bam files, trim rname/rnext fields after the first space.
- Replace characters in headers such as space, *, and | to make them valid
file names.
- warnifnosequence=t
- For fasta, issue a warning if a sequenceless header is encountered.
- warnfirsttimeonly=t
- Issue a warning for only the first sequenceless header.
- utot=f
- Convert U to T (for RNA -> DNA translation).
- padleft=0
- Pad the left end of sequences with this many symbols.
- padright=0
- Pad the right end of sequences with this many symbols.
- pad=0
- Set padleft and padright to the same value.
- padsymbol=N
- Symbol to use for padding.
- reads=-1
- Set to a positive number to only process this many INPUT reads (or pairs),
then quit.
- skipreads=-1
- Skip (discard) this many INPUT reads before processing the rest.
- samplerate=1
- Randomly output only this fraction of reads; 1 means sampling is
disabled.
- sampleseed=-1
- Set to a positive number to use that prng seed for sampling (allowing
deterministic sampling).
- samplereadstarget=0
- (srt) Exact number of OUTPUT reads (or pairs) desired.
- samplebasestarget=0
- (sbt) Exact number of OUTPUT bases desired.
- Important: srt/sbt flags should not be used with stdin, samplerate, qtrim,
minlength, or minavgquality.
- upsample=f
- Allow srt/sbt to upsample (duplicate reads) when the target is greater
than input.
- prioritizelength=f
- If true, calculate a length threshold to reach the target, and retain all
reads of at least that length (must set srt or sbt).
- qtrim=f
- Trim read ends to remove bases with quality below trimq.
- Values: t (trim both ends), f (neither end), r (right end only), l (left
end only), w (sliding window).
- trimq=6
- Regions with average quality BELOW this will be trimmed. Can be a
floating-point number like 7.3.
- minlength=0
- (ml) Reads shorter than this after trimming will be discarded. Pairs will
be discarded only if both are shorter.
- mlf=0
- (mlf) Reads shorter than this fraction of original length after trimming
will be discarded.
- maxlength=0
- If nonzero, reads longer than this after trimming will be discarded.
- breaklength=0
- If nonzero, reads longer than this will be broken into multiple reads of
this length. Does not work for paired reads.
- requirebothbad=t
- (rbb) Only discard pairs if both reads are shorter than minlen.
- invertfilters=f
- (invert) Output failing reads instead of passing reads.
- minavgquality=0
- (maq) Reads with average quality (after trimming) below this will be
discarded.
- maqb=0
- If positive, calculate maq from this many initial bases.
- chastityfilter=f
- (cf) Reads with names containing ' 1:Y:' or ' 2:Y:' will be
discarded.
- barcodefilter=f
- Remove reads with unexpected barcodes if barcodes is set, or barcodes
containing 'N' otherwise.
- A barcode must be the last part of the read header.
- barcodes=
- Comma-delimited list of barcodes or files of barcodes.
- maxns=-1
- If 0 or greater, reads with more Ns than this (after trimming) will be
discarded.
- minconsecutivebases=0
- (mcb) Discard reads without at least this many consecutive called
bases.
- forcetrimleft=0
- (ftl) If nonzero, trim left bases of the read to this position (exclusive,
0-based).
- forcetrimright=0
- (ftr) If nonzero, trim right bases of the read after this position
(exclusive, 0-based).
- forcetrimright2=0
- (ftr2) If positive, trim this many bases on the right end.
- forcetrimmod=5
- (ftm) If positive, trim length to be equal to zero modulo this
number.
- mingc=0
- Discard reads with GC content below this.
- maxgc=1
- Discard reads with GC content above this.
- gcpairs=t
- Use average GC of paired reads.
- Also affects gchist.
- mappedonly=f
- Toss unmapped reads.
- unmappedonly=f
- Toss mapped reads.
- pairedonly=f
- Toss reads that are not mapped as proper pairs.
- unpairedonly=f
- Toss reads that are mapped as proper pairs.
- primaryonly=f
- Toss secondary alignments. Set this to true for sam to fastq
conversion.
- minmapq=-1
- If non-negative, toss reads with mapq under this.
- maxmapq=-1
- If non-negative, toss reads with mapq over this.
- requiredbits=0
- (rbits) Toss sam lines with any of these flag bits unset. Similar to
samtools -f.
- filterbits=0
- (fbits) Toss sam lines with any of these flag bits set. Similar to
samtools -F.
- stoptag=f
- Set to true to write a tag indicating read stop location, prefixed by
YS:i:
- sam=
- Set to 'sam=1.3' to convert '=' and 'X' cigar symbols (from sam 1.4+
format) to 'M'.
- Set to 'sam=1.4' to convert 'M' to '=' and 'X' (sam=1.4 requires MD tags
to be present, or ref to be specified).
These require = and X symbols in cigar strings, or MD tags, or
areference fasta. -1 means disabled; to filter reads with any of a
symbol type, set to 0.
- subfilter=-1
- Discard reads with more than this many substitutions.
- insfilter=-1
- Discard reads with more than this many insertions.
- delfilter=-1
- Discard reads with more than this many deletions.
- indelfilter=-1
- Discard reads with more than this many indels.
- editfilter=-1
- Discard reads with more than this many edits.
- inslenfilter=-1
- Discard reads with an insertion longer than this.
- dellenfilter=-1
- Discard reads with a deletion longer than this.
- idfilter=-1.0
- Discard reads with identity below this.
- clipfilter=-1
- Discard reads with more than this many soft-clipped bases.
- k=0
- If positive, count the total number of kmers.
- cardinality=f
- (loglog) Count unique kmers using the LogLog algorithm.
- loglogbuckets=1999
- Use this many buckets for cardinality estimation.
The # symbol will be substituted for 1 and 2. The % symbol in out
will be substituted for input name minus extensions. For example:
- reformat.sh in=read#.fq out=%.fa
...is equivalent to:
- reformat.sh in1=read1.fq in2=read2.fq out1=read1.fa out2=read2.fa
- -Xmx
- This will set Java's memory usage, overriding autodetection.
- -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will
specify 200 megs. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception
occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Written by Brian Bushnell (Last modified February 21, 2019)
Please contact Brian Bushnell at bbushnell@lbl.gov if you
encounter any problems.
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.