featureCounts - a highly efficient and accurate read summarization
program
featureCounts [options] -a
<annotation_file> -o <output_file> input_file1
[input_file2] ...
Version 1.6.3
## Mandatory arguments:
- -a <string>
- Name of an annotation file. GTF/GFF format by default. See -F
option for more format information. Inbuilt annotations (SAF format) is
available in 'annotation' directory of the package. Gzipped file is also
accepted.
- -o <string>
- Name of the output file including read counts. A separate file including
summary statistics of counting results is also included in the output
('<string>.summary')
- input_file1
[input_file2] ...
- A list of SAM or BAM format files. They can be
- either name or location
sorted. If no files provided,
- <stdin> input is expected. Location-sorted paired-end reads are
automatically sorted by read names.
## Optional arguments: # Annotation
- -F <string>
- Specify format of the provided annotation file. Acceptable formats include
'GTF' (or compatible GFF format) and 'SAF'. 'GTF' by default. For SAF
format, please refer to Users Guide.
- -t <string>
- Specify feature type in GTF annotation. 'exon' by default. Features used
for read counting will be extracted from annotation using the provided
value.
- -g <string>
- Specify attribute type in GTF annotation. 'gene_id' by default.
Meta-features used for read counting will be extracted from annotation
using the provided value.
- Extract extra attribute types from the provided GTF annotation and include
them in the counting output. These attribute types will not be used to
group features. If more than one attribute type is provided they should be
separated by comma.
- -A <string>
- Provide a chromosome name alias file to match chr names in annotation with
those in the reads. This should be a twocolumn comma-delimited text file.
Its first column should include chr names in the annotation and its second
column should include chr names in the reads. Chr names are case
sensitive. No column header should be included in the file.
# Level of summarization
- -f
- Perform read counting at feature level (eg. counting reads for exons
rather than genes).
# Overlap between reads and features
- -O
- Assign reads to all their overlapping meta-features (or features if
-f is specified).
- --minOverlap
<int>
- Minimum number of overlapping bases in a read that is required for read
assignment. 1 by default. Number of overlapping bases is counted from both
reads if paired end. If a negative value is provided, then a gap of up to
specified size will be allowed between read and the feature that the read
is assigned to.
- --fracOverlap
<float> Minimum fraction of overlapping bases in a read that
is
- required for read assignment. Value should be within range [0,1]. 0 by
default. Number of overlapping bases is counted from both reads if paired
end. Both this option and '--minOverlap' option need to be satisfied for
read assignment.
- --fracOverlapFeature
<float> Minimum fraction of overlapping bases in a
- feature that is required for read assignment. Value should be within range
[0,1]. 0 by default.
- --largestOverlap
- Assign reads to a meta-feature/feature that has the largest number of
overlapping bases.
- --nonOverlap
<int>
- Maximum number of non-overlapping bases in a read (or a read pair) that is
allowed when being assigned to a feature. No limit is set by default.
- --nonOverlapFeature
<int> Maximum number of non-overlapping bases in a feature
- that is allowed in read assignment. No limit is set by default.
- --readExtension5
<int> Reads are extended upstream by <int> bases from
their
- 5' end.
- --readExtension3
<int> Reads are extended upstream by <int> bases from
their
- 3' end.
- --read2pos
<5:3>
- Reduce reads to their 5' most base or 3' most base. Read counting is then
performed based on the single base the read is reduced to.
# Multi-mapping reads
- -M
- Multi-mapping reads will also be counted. For a multimapping read, all its
reported alignments will be counted. The 'NH' tag in BAM/SAM input is used
to detect multi-mapping reads.
# Fractional counting
- --fraction
- Assign fractional counts to features. This option must be used together
with '-M' or '-O' or both. When '-M' is specified, each reported alignment
from a multi-mapping read (identified via 'NH' tag) will carry a
fractional count of 1/x, instead of 1 (one), where x is the total number
of alignments reported for the same read. When '-O' is specified, each
overlapping feature will receive a fractional count of 1/y, where y is the
total number of features overlapping with the read. When both '-M' and
'-O' are specified, each alignment will carry a fractional count of
1/(x*y).
# Read filtering
- -Q <int>
- The minimum mapping quality score a read must satisfy in order to be
counted. For paired-end reads, at least one end should satisfy this
criteria. 0 by default.
- --splitOnly
- Count split alignments only (ie. alignments with CIGAR string containing
'N'). An example of split alignments is exon-spanning reads in RNA-seq
data.
- --nonSplitOnly
- If specified, only non-split alignments (CIGAR strings do not contain
letter 'N') will be counted. All the other alignments will be
ignored.
- --primary
- Count primary alignments only. Primary alignments are identified using bit
0x100 in SAM/BAM FLAG field.
- --ignoreDup
- Ignore duplicate reads in read counting. Duplicate reads are identified
using bit Ox400 in BAM/SAM FLAG field. The whole read pair is ignored if
one of the reads is a duplicate read for paired end data.
# Strandness
- -s <int or
string>
- Perform strand-specific read counting. A single integer value (applied to
all input files) or a string of commaseparated values (applied to each
corresponding input file) should be provided. Possible values include: 0
(unstranded), 1 (stranded) and 2 (reversely stranded). Default value is 0
(ie. unstranded read counting carried out for all input files).
# Exon-exon junctions
- -J
- Count number of reads supporting each exon-exon junction. Junctions were
identified from those exon-spanning reads in the input (containing 'N' in
CIGAR string). Counting results are saved to a file named
'<output_file>.jcounts'
- -G <string>
- Provide the name of a FASTA-format file that contains the reference
sequences used in read mapping that produced the provided SAM/BAM files.
This optional argument can be used with '-J' option to improve read
counting for junctions.
# Parameters specific to paired end reads
- -p
- If specified, fragments (or templates) will be counted instead of reads.
This option is only applicable for paired-end reads.
- -B
- Only count read pairs that have both ends aligned.
- -P
- Check validity of paired-end distance when counting read pairs. Use
-d and -D to set thresholds.
- -d <int>
- Minimum fragment/template length, 50 by default.
- -D <int>
- Maximum fragment/template length, 600 by default.
- -C
- Do not count read pairs that have their two ends mapping to different
chromosomes or mapping to same chromosome but on different strands.
- --donotsort
- Do not sort reads in BAM/SAM input. Note that reads from the same pair are
required to be located next to each other in the input.
# Number of CPU threads
- -T <int>
- Number of the threads. 1 by default.
# Read groups
- --byReadGroup
- Assign reads by read group. "RG" tag is required to be present
in the input BAM/SAM files.
# Long reads
- -L
- Count long reads such as Nanopore and PacBio reads. Long read counting can
only run in one thread and only reads (not read-pairs) can be counted.
There is no limitation on the number of 'M' operations allowed in a CIGAR
string in long read counting.
# Assignment results for each read
- -R <format>
- Output detailed assignment results for each read or readpair. Results are
saved to a file that is in one of the following formats: CORE, SAM and
BAM. See Users Guide for more info about these formats.
- --Rpath
<string>
- Specify a directory to save the detailed assignment results. If
unspecified, the directory where counting results are saved is used.
# Miscellaneous
- --tmpDir
<string>
- Directory under which intermediate files are saved (later removed). By
default, intermediate files will be saved to the directory specified in
'-o' argument.
- --maxMOp
<int>
- Maximum number of 'M' operations allowed in a CIGAR string. 10 by default.
Both 'X' and '=' are treated as 'M' and adjacent 'M' operations are merged
in the CIGAR string.
- --verbose
- Output verbose information for debugging, such as unmatched
chromosome/contig names.
- -v
- Output version of the program.