QTLtools-trans(1) | Bioinformatics tools | QTLtools-trans(1) |
QTLtools trans - trans QTL analysis
QTLtools trans --vcf [in.vcf|in.vcf.gz|in.bcf|in.bed.gz] --bed quantifications.bed.gz [--nominal | --permute | --sample integer | --adjust in.txt] --out output.txt [OPTIONS]
This mode maps trans (distal) quantitative trait loci (QTLs) that affect the phenotypes, using linear regression. The method is detailed in <https://www.nature.com/articles/ncomms15452>. We first regress out the provided covariates from the phenotype data, followed by running the linear regression between the phenotype residuals and the genotype. If --normal and --cov are provided at the same time, then the residuals after the covariate correction are rank normal transformed. It incorporates an efficient permutation scheme. You can run a nominal pass (--nominal) listing all genotype-phenotype associations below a certain threshold, a permutation pass (--permute or --sample no_genes_to_sample) to empirically characterize the null distribution of associations, or adjust the nominal p-values based on permutations (--adjust).
In the full permutation scheme (--permute) we permute all phenotypes using the same random number sequence to preserve the correlation structure. By doing so, the only association we actually break in the data is between the genotype and the phenotype data. Then, we proceed with a standard association scan identical to the one used in the nominal pass. In practice, we repeat this for 100 permutations of the phenotype data. Subsequently, we can proceed with FDR correction by ranking all the nominal p-values in ascending order and by counting how many p-values in the permuted data sets are smaller. This provides an FDR estimate: if we have 500 p-values in the permuted data sets that are smaller than the 100th smallest nominal p-value, we can then assume that the FDR for the 100 first associations is around 5% (=500/(100 × 100)).
To enable fast screening in trans, we also designed an approximation of the method described just above based on what we already do in cis. To make it possible, we assume that the phenotypes are independent and normally distributed (which can be enforced with --normal). The idea is that since all phenotypes are normally distributed, effectively they are the same, and also the cis region removed from each phenotype is so small compared to rest of the genome that its phenotype specific impact is negligible. Hence the number of and the correlation amongst variants for each phenotype is approximately the same, and each phenotype is approximately the same; thus we can run permutations with a small number of phenotypes rather then all, which drastically decreases the computational burden and the null distribution generated can be applied to all phenotypes. The implementation draws from the null by permuting some randomly chosen phenotypes, testing for associations with all variants in trans and storing the smallest p-value. When we repeat this many times (typically 1000), effectively building a null distribution of the strongest associations for a single phenotype. We then make it continuous by fitting a beta distribution as we do in cis and use it to adjust every nominal p-value coming from the initial pass for the number of variants being tested. To correct for the number of phenotypes being tested, we estimate FDR as we do in cis; that is from the best adjusted p-values per phenotype (one per phenotype). This also gives an adjusted p-value threshold that we use to identify all phenotype-variant pairs that are whole-genome significant. In our experiments, this approach gives similar results to the full permutation scheme both in term of FDR estimates and number of discoveries, while running faster.
Since linear regressions assumes normally distributed data, we highly recommend using the --normal option to rank normal transform the phenotype quantifications in order to avoid false positive associations due to outliers. If you are using the approximate permutation scheme (--sample) you MUST use the --normal option or make sure that your phenotypes are normally distributed.
1 | The phenotype ID |
2 | The phenotype chromosome |
3 | Start position of the phenotype |
4 | The variant ID |
5 | The variant chromosome |
6 | The start position of the variant |
7 | The nominal p-value of the association between the variant and the phenotype. |
8 | The adjusted p-value of the association between the variant and the phenotype. Requires --adjust |
9 | Correlation coefficient |
1 | The phenotype ID |
2 | The adjusted p-value of the association between the variant and the phenotype. Requires --adjust |
3 | The nominal p-value of the association between the variant and the phenotype. |
4 | The variant ID |
1 | The index of the bin |
2 | The lower bound of the correlation coefficient for this bin |
3 | The upper bound of the correlation coefficient for this bin |
4 | The upper bound of the p-value for this bin |
5 | The lower bound of the p-value for this bin |
echo "QTLtools trans --vcf genotypes.chr22.vcf.gz --bed
genes.simulated.chr22.bed.gz --permute --normal --out trans.perm$j.txt
--seed $j" | qsub
done
QTLtools website: <https://qtltools.github.io/qtltools>
Please submit bugs to <https://github.com/qtltools/qtltools>
Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery and analysis. Nat Commun 8, 15452 (2017). <https://doi.org/10.1038/ncomms15452>
Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)
06 May 2020 | QTLtools-v1.3 |