cutadapt - remove adapter sequences from high-throughput
sequencing reads
- cutadapt -a ADAPTER [options] [-o output.fastq]
input.fastq
For paired-end reads:
- cutadapt -a ADAPT1 -A ADAPT2 [options] -o
out1.fastq -p out2.fastq in1.fastq in2.fastq
Replace "ADAPTER" with the actual sequence of your 3'
adapter. IUPAC wildcard characters are supported. The reverse complement is
*not* automatically searched. All reads from input.fastq will be written to
output.fastq with the adapter sequence removed. Adapter matching is
error-tolerant. Multiple adapter sequences can be given (use further
-a options), but only the best-matching adapter will be removed.
Input may also be in FASTA format. Compressed input and output is
supported and auto-detected from the file name (.gz, .xz, .bz2). Use the
file name '-' for standard input/output. Without the -o option,
output is sent to standard output.
- --help
- show all command-line options
- --version
- show program's version number and exit
- -h, --help
- show this help message and exit
- --debug
- Print debugging information.
- -f FORMAT,
--format=FORMAT
- Input file format; can be either 'fasta', 'fastq' or 'sra-fastq'. Ignored
when reading csfasta/qual files. Default: auto-detect from file name
extension.
- Finding adapters::
- Parameters -a, -g, -b specify adapters to be removed
from each read (or from the first read in a pair if data is paired). If
specified multiple times, only the best matching adapter is trimmed (but
see the --times option). When the special notation 'file:FILE' is
used, adapter sequences are read from the given FASTA file.
- -a ADAPTER,
--adapter=ADAPTER
- Sequence of an adapter ligated to the 3' end (paired data: of the first
read). The adapter and subsequent bases are trimmed. If a '$' character is
appended ('anchoring'), the adapter is only found if it is a suffix of the
read.
- -g ADAPTER,
--front=ADAPTER
- Sequence of an adapter ligated to the 5' end (paired data: of the first
read). The adapter and any preceding bases are trimmed. Partial matches at
the 5' end are allowed. If a '^' character is prepended ('anchoring'), the
adapter is only found if it is a prefix of the read.
- -b ADAPTER,
--anywhere=ADAPTER
- Sequence of an adapter that may be ligated to the 5' or 3' end (paired
data: of the first read). Both types of matches as described under
-a und -g are allowed. If the first base of the read is part
of the match, the behavior is as with -g, otherwise as with
-a. This option is mostly for rescuing failed library preparations
- do not use if you know which end your adapter was ligated to!
- -e ERROR_RATE,
--error-rate=ERROR_RATE
- Maximum allowed error rate (no. of errors divided by the length of the
matching region). Default: 0.1
- --no-indels
- Allow only mismatches in alignments. Default: allow both mismatches and
indels
- -n COUNT,
--times=COUNT
- Remove up to COUNT adapters from each read. Default: 1
- -O MINLENGTH,
--overlap=MINLENGTH
- If the overlap between the read and the adapter is shorter than MINLENGTH,
the read is not modified. Reduces the no. of bases trimmed due to random
adapter matches. Default: 3
- --match-read-wildcards
- Interpret IUPAC wildcards in reads. Default: False
- -N,
--no-match-adapter-wildcards
- Do not interpret IUPAC wildcards in adapters.
- --no-trim
- Match and redirect reads to output/untrimmed-output as usual, but do not
remove adapters.
- --mask-adapter
- Mask adapters with 'N' characters instead of trimming them.
- Additional read modifications:
- -u LENGTH,
--cut=LENGTH
- Remove bases from each read (first read only if paired). If LENGTH is
positive, remove bases from the beginning. If LENGTH is negative, remove
bases from the end. Can be used twice if LENGTHs have different
signs.
- -q [5'CUTOFF,]3'CUTOFF,
--quality-cutoff=[5'CUTOFF,]3'CUTOFF
- Trim low-quality bases from 5' and/or 3' ends of each read before adapter
removal. Applied to both reads if data is paired. If one value is given,
only the 3' end is trimmed. If two comma-separated cutoffs are given, the
5' end is trimmed with the first cutoff, the 3' end with the second.
- --nextseq-trim=3'CUTOFF
- NextSeq-specific quality trimming (each read). Trims also dark cycles
appearing as high-quality G bases (EXPERIMENTAL).
- --quality-base=QUALITY_BASE
- Assume that quality values in FASTQ are encoded as ascii(quality +
QUALITY_BASE). This needs to be set to 64 for some old Illumina FASTQ
files. Default: 33
- --trim-n
- Trim N's on ends of reads.
- -x PREFIX,
--prefix=PREFIX
- Add this prefix to read names. Use {name} to insert the name of the
matching adapter.
- -y SUFFIX,
--suffix=SUFFIX
- Add this suffix to read names; can also include {name}
- --strip-suffix=STRIP_SUFFIX
- Remove this suffix from read names if present. Can be given multiple
times.
- --length-tag=TAG
- Search for TAG followed by a decimal number in the description field of
the read. Replace the decimal number with the correct length of the
trimmed read. For example, use --length-tag 'length=' to correct
fields like 'length=123'.
- Filtering of processed reads:
- --discard-trimmed,
--discard
- Discard reads that contain an adapter. Also use -O to avoid
discarding too many randomly matching reads!
- --discard-untrimmed,
--trimmed-only
- Discard reads that do not contain the adapter.
- -m LENGTH,
--minimum-length=LENGTH
- Discard trimmed reads that are shorter than LENGTH. Reads that are too
short even before adapter removal are also discarded. In colorspace, an
initial primer is not counted. Default: 0
- -M LENGTH,
--maximum-length=LENGTH
- Discard trimmed reads that are longer than LENGTH. Reads that are too long
even before adapter removal are also discarded. In colorspace, an initial
primer is not counted. Default: no limit
- --max-n=COUNT
- Discard reads with too many N bases. If COUNT is an integer, it is treated
as the absolute number of N bases. If it is between 0 and 1, it is treated
as the proportion of N's allowed in a read.
- Output:
- --quiet
- Print only error messages.
- -o FILE,
--output=FILE
- Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on
input. The summary report is sent to standard output. Use '{name}' in FILE
to demultiplex reads into multiple files. Default: write to standard
output
- --info-file=FILE
- Write information about each read and its adapter matches into FILE. See
the documentation for the file format.
- -r FILE,
--rest-file=FILE
- When the adapter matches in the middle of a read, write the rest (after
the adapter) into FILE.
- --wildcard-file=FILE
- When the adapter has N bases (wildcards), write adapter bases matching
wildcard positions to FILE. When there are indels in the alignment, this
will often not be accurate.
- --too-short-output=FILE
- Write reads that are too short (according to length specified by
-m) to FILE. Default: discard reads
- --too-long-output=FILE
- Write reads that are too long (according to length specified by -M)
to FILE. Default: discard reads
- --untrimmed-output=FILE
- Write reads that do not contain the adapter to FILE. Default: output to
same file as trimmed reads
- Colorspace options:
- -c,
--colorspace
- Enable colorspace mode: Also trim the color that is adjacent to the found
adapter.
- -d,
--double-encode
- Double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).
- -t,
--trim-primer
- Trim primer base and the first color (which is the transition to the first
nucleotide)
- --strip-f3
- Strip the _F3 suffix of read names
- --maq,
--bwa
- MAQ- and BWA-compatible colorspace output. This enables -c,
-d, -t, --strip-f3 and -y '/1'.
- --no-zero-cap
- Do not change negative quality values to zero in colorspace data. By
default, they are since many tools have problems with negative
qualities.
- -z,
--zero-cap
- Change negative quality values to zero. This is enabled by default when
-c/--colorspace is also enabled. Use the above option to disable
it.
- Paired-end options:
- The -A/-G/-B/-U options work like their -a/-b/-g/-u
counterparts, but are applied to the second read in each pair.
- -A ADAPTER
- 3' adapter to be removed from second read in a pair.
- -G ADAPTER
- 5' adapter to be removed from second read in a pair.
- -B ADAPTER
- 5'/3 adapter to be removed from second read in a pair.
- -U LENGTH
- Remove LENGTH bases from second read in a pair (see --cut).
- -p FILE,
--paired-output=FILE
- Write second read in a pair to FILE.
- --pair-filter=(any|both)
- Which of the reads in a paired-end read have to match the filtering
criterion in order for it to be filtered. Default: any
- --interleaved
- Read and write interleaved paired-end reads.
- --untrimmed-paired-output=FILE
- Write second read in a pair to this FILE when no adapter was found in the
first read. Use this option together with --untrimmed-output when
trimming pairedend reads. Default: output to same file as trimmed
reads
- --too-short-paired-output=FILE
- Write second read in a pair to this file if pair is too short. Use
together with --too-short-output.
- --too-long-paired-output=FILE
- Write second read in a pair to this file if pair is too long. Use together
with --too-long-output.
See http://cutadapt.readthedocs.org/ for full documentation.
Marcel Martin. Cutadapt removes adapter sequences from
high-throughput sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.
http://dx.doi.org/10.14806/ej.17.1.200
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.