QTLtools-quan(1) | Bioinformatics tools | QTLtools-quan(1) |
QTLtools quan - Quantify gene and exon expression from RNA-seq
QTLtools quan --bam [in.sam|in.bam|in.cram] --gtf gene_annotation.gtf --out-prefix output [OPTIONS]
This mode quantifies the expression of genes and exons in the provided --gtf file using the RNA-seq reads in the --bam file. The method counts the number of reads overlapping the exons in the --gtf file. Firstly all exons of a gene are converted into meta-exons where overlapping exons are merged into a single exon encompassing all the overlapping exons. Any overlap between the read and the exon is considered a match, that is a read is not required to be in between start and end positions of an exon to count towards that exon's quantification. Split reads aligning to multiple exons contribute to each exon it overlaps with based on the fraction of the read that overlaps with a given exon. Thus split reads contribute less than a single count to each of the overlapping exons. Reads aligning to multiple exons (i.e. overlapping exons of multiple genes) count towards the quantification of all the exons that it overlaps with. If the --bam file contains paired-end reads and if there are cases where the two mate pairs overlap with each other (i.e. have an insert size < 0), then each of these reads contribute less then a single count towards the quantifications unless --no-merge is provided. The following diagram, with two genes with overlapping exons and one paired-end read where both mate pairs are split reads and overlap with each other, illustrates how the quantification works:
x x
/ \ / \
+---------+ +---------+ +---------+
| Exon1|1 | | Exon1|2 | | Exon1|3 | Gene1
+---------+ +---------+ +---------+
x
/ \
+------------+ +-------------+
| Exon2|1 | | Exon2|2 | Gene2
+------------+ +-------------+
x
/ \
+------------+ +----+ RNAseq Read Mate1
|--a-||-b-||c| |-d--||--e-|
x
/ \
+------+ +----------+ RNAseq Read Mate2
Left Mate1 = ((b * 0.5) + a) / (a + b + d)
Right Mate1 = (d * 0.5)/(a + b + d)
Left Mate2 = (b * 0.5)/(b + d + e)
Right Mate2 = ((d * 0.5) + e)/(b + d + e)
Exon1|2 = Left Mate1 + Left Mate2
Exon1|3 = Right Mate1 + Right Mate2
Exon2|1 = Left Mate1 + Left Mate2
Exon2|2 = Right Mate2 + Right Mate2
Gene1 = Exon1|2 + Exon1|3
Gene2 = Exon2|1 + Exon2|2
The quan mode in version 1.2 and above is not compatible with the quantifications generated by the previous versions. This due to bug fixes and slight adjustments to the way we quantify. DO NOT MIX QUANTIFICATIONS GENERATED BY EARLIER VERSIONS OF QTLTOOLS WITH QUANTIFICATIONS FROM VERSION 1.2 AND ABOVE AS THIS WILL CREATE A BIAS IN YOUR DATASET.
Unless --no-hash is provided, all output files will include a hash value corresponding to combination of the specific options used. This is given so that one does not merge quantifications from samples that were quantified differently, which would create a bias in the dataset.
1 | chr | Phenotype's chromosome |
2 | start | Phenotype's start position (0-based) |
3 | end | Phenotype's end position (1-based) |
4 | gene|exon | The gene or exon ID. |
5 | info|geneID | Information about the gene or the gene ID of the exon. The gene info is separated by semicolons, and L=gene length, T=gene type, R=gene positions, N=gene name |
6 | strand | Phenotype's strand |
7 | sample_name | The sample name of the BAM file |
1 | filtered_secondary_alignments_(does_not_count_towards_total_reads) | Number of secondary alignments |
2 | total_reads | Number of reads in the BAM file |
3 | filtered_unmapped | Number of unmapped reads |
4 | filtered_failqc | Number of reads with the failed QC tag |
5 | filtered_duplicate | Number of duplicate reads |
6 | filtered_mapQ_less_than_X | Number of reads below the mapping quality threshold X |
7 | filtered_notpaired | Number of pairs that were not in the correct orientation or were not properly paired |
8 | filtered_mismatches_greater_than_X_Y | Number of reads failing the mismatches per read, X, and mismatches total filters, Y |
9 | filtered_unmatched_mate_pairs | Number of reads where there was a paired-read with a missing mate |
10 | total_good | Number of reads that passed all filters |
11 | total_exonic | Number of reads that aligned to exons and passed all filters |
12 | total_exonic_multi_counting | Number of reads that aligned to exons when we count reads that align to multiple exons multiple times |
13 | total_merged_reads | Number of reads where the mate pairs were overlapping and thus were merged |
14 | total_exonic_multi_counting_after_merge_(used_for_rpkm) | Number of reads that aligned to exons when we merge overlapping mate pairs |
15 | good_over_total | Number of good reads over the total number of reads |
16 | exonic_over_total | Number of exonic reads over the total number of reads |
17 | exonic_over_good | Number of exonic reads over the number of good reads |
QTLtools website: <https://qtltools.github.io/qtltools>
Please submit bugs to <https://github.com/qtltools/qtltools>
Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery and analysis. Nat Commun 8, 15452 (2017). <https://doi.org/10.1038/ncomms15452>
Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)
06 May 2020 | QTLtools-v1.3 |