QTLtools-mbv(1) | Bioinformatics tools | QTLtools-mbv(1) |
QTLtools mbv - Match genotypes in a VCF to a BAM file
QTLtools mbv --bam [sample.bam|sample.sam|sample.cram] --vcf [in.vcf|in.bcf|in.vcf.gz] --out output_file [OPTIONS]
This mode checks if the genotypes in the VCF are observed in the RNAseq reads in the BAM file to quickly solve sample mislabeling and detect cross-sample contamination and PCR amplification bias. The details of the method are described <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394/>. In brief, we measure, for each individual in the VCF, the proportions of heterozygous and homozygous genotypes for which both alleles are captured by the sequencing reads in the BAM file. A 'match' would have close to 100% concordance for both measures, whereas a 'mismatch' will have significantly lower concordance for both metrics. Increased cross-sample contaminations leads to decreased homozygous concordance values with no change in heterozygous concordance while increased amplification bias leads to decreased heterozygous concordance with no change in homozygous concordance. We recommend using uniquely mapping reads only by specifying the correct --filter-mapping-quality.
1 | The sample ID in the VCF against which the sequence data has been matched |
2 | The number of missing genotypes for this sample |
3 | The total number of heterozygous genotypes examined |
4 | The total number of homozygous genotypes examined |
5 | The number of heterozygous genotypes considered for the matching, i.e. those that are covered by more than --filter-minimal-coverage |
6 | The number of homozygous genotypes considered for the matching, i.e. those that are covered by more than --filter-minimal-coverage |
7 | The number of heterozygous genotypes that match between this sample and the BAM file |
8 | The number of homozygous genotypes that match between this sample and the BAM file |
9 | The percentage of heterozygous genotypes that match between this sample and the BAM file |
10 | The percentage of homozygous genotypes that match between this sample and the BAM file |
11 | The number of heterozygous genotypes with significant allelic imbalance |
You can then plot column 9 vs. 10 to identify the genotyped sample in the VCF that matches best your sequence data.
QTLtools website: <https://qtltools.github.io/qtltools>
Please submit bugs to <https://github.com/qtltools/qtltools>
Fort A., Panousis N. I., Garieri M. et al. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets, Bioinformatics 33(12), 1895 2017. <https://doi.org/10.1093/bioinformatics/btx074>
Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)
06 May 2020 | QTLtools-v1.3 |