QTLtools-cis(1) | Bioinformatics tools | QTLtools-cis(1) |
QTLtools cis - cis QTL analysis
QTLtools cis --vcf [in.vcf|in.vcf.gz|in.bcf|in.bed.gz] --bed quantifications.bed.gz [--nominal float | --permute integer | --mapping in.txt] --out output.txt [OPTIONS]
This mode maps cis (proximal) quantitative trait loci (QTLs) that affect the phenotype, using linear regression. The method is detailed in <https://www.nature.com/articles/ncomms15452>. We first regress out the provided covariates from the phenotype data, followed by running the linear regression between the phenotype residuals and the genotype. If --normal and --cov are provided at the same time, then the residuals after the covariate correction are rank normal transformed. It incorporates an efficient permutation scheme to control for differential multiple testing burden of each phenotype. You can run a nominal pass (--nominal threshold) listing all genotype-phenotype associations below a certain threshold, a permutation pass (--permute no_of_permutations) to empirically characterize the null distribution of associations for each phenotype separately, thus adjusting the nominal p-value of the best association for a phenotype, or a conditional analysis pass (--mapping filename) to discover multiple proximal QTLs with independent effects on a phenotype.
As multiple molecular phenotypes can belong to higher order biological entities, e.g. exons of genes, QTLtools cis allows grouping of phenotypes to maximize the discoveries in such particular cases. Specifically, QTLtools can either aggregate multiple phenotypes in a given group into a single phenotype via PCA (--grp-pca1) or by taking their mean (--grp-mean), or directly use all individual phenotypes in an extended permutation scheme that accounts for their number and correlation structure (--grp-best). In our experience, --grp-best outperforms the other options for expression QTLs (eQTLs).
The conditional analysis pass first uses permutations to derive a nominal p-value threshold per phenotype that varies and reflects the number of independent tests per cis-window. Then, it uses a forward-backward stepwise regression to learn the number of independent signals per phenotype, determine the best candidate variant per signal and assign all significant hits to the independent signal they relate to.
Since linear regressions assumes normally distributed data, we highly recommend using the --normal option to rank normal transform the phenotype quantifications in order to avoid false positive associations due to outliers.
1 | phe_id | grp_id | The phenotype ID or if one of the grouping options is provided, then phenotype group ID |
2 | phe_chr | The phenotype chromosome |
3 | phe_from | Start position of the phenotype |
4 | phe_to | End position of the phenotype |
5 | phe_strd | The phenotype strand |
5.1 | phe_id | ve_by_pc1 | n_phe_in_grp | Only printed if --group-best | --group-pca1 | --group-mean. The phenotype ID, variance explained by PC1, or number of phenotypes in the phenotype group for --group-best, --group-pca1, and --group-mean, respectively. |
5.2 | n_phe_in_grp | Only printed if --group-pca1 | --group-mean. The number of phenotypes in the phenotype group. |
6 | n_var_in_cis | The number variants in the cis window for this phenotype. |
7 | dist_phe_var | The distance between the variant and the phenotype start positions. |
8 | var_id | The variant ID. |
9 | var_chr | The variant chromosome. |
10 | var_from | The start position of the variant. |
11 | var_to | The end position of the variant. |
12 | nom_pval | The nominal p-value of the association between the variant and the phenotype. |
13 | r_squared | The r squared of the linear regression. |
14 | slope | The beta (slope) of the linear regression. |
14.1 | slope_se | The standard error of the beta. Only printed if --std-err is provided. |
15 | best_hit | Whether this varint was the best hit for this phenotype. |
1 | phe_id | grp_id | The phenotype ID or if one of the grouping options is provided, then phenotype group ID |
2 | phe_chr | The phenotype chromosome |
3 | phe_from | Start position of the phenotype |
4 | phe_to | End position of the phenotype |
5 | phe_strd | The phenotype strand |
5.1 | phe_id | ve_by_pc1 | n_phe_in_grp | Only printed if --group-best | --group-pca1 | --group-mean. The phenotype ID, variance explained by PC1, or number of phenotypes in the phenotype group for --group-best, --group-pca1, and --group-mean, respectively. |
5.2 | n_phe_in_grp | Only printed if --group-pca1 | --group-mean. The number of phenotypes in the phenotype group. |
6 | n_var_in_cis | The number variants in the cis window for this phenotype. |
7 | dist_phe_var | The distance between the variant and the phenotype start positions. |
8 | var_id | The most significant variant ID. |
9 | var_chr | The most significant variant's chromosome. |
10 | var_from | The start position of the most significant variant. |
11 | var_to | The end position of the most significant variant. |
12 | dof1 | The number of degrees of freedom used to compute the p-values. |
13 | dof2 | Estimated number of degrees of freedom used in beta approximation p-value calculations. |
14 | bml1 | The first shape parameter of the fitted beta distribution (alpha parameter). These should be close to 1. |
15 | bml2 | The second shape parameter of the fitted beta distribution (beta parameter). This corresponds to the effective number of independent tests in the region. |
16 | nom_pval | The nominal p-value of the association between the most significant variant and the phenotype. |
17 | r_squared | The r squared of the linear regression. |
18 | slope | The beta (slope) of the linear regression. |
18.1 | slope_se | The standard error of the beta. Only printed if --std-err is provided. |
19 | adj_emp_pval | Adjusted empirical p-value from permutations. This is the adjusted p-value not using the beta approximation. Simply calculated as: (number of p-values observed during permutations that were smaller than or equal to the nominal p-value + 1) / (number of permutations + 1). The most significant p-value achievable would be 1 / (number of permutations + 1). |
20 | adj_beta_pval | Adjusted empirical p-value given by the fitted beta distribution. We strongly recommend using this adjusted p-value in any downstream analysis. |
1 | phe_id | grp_id | The phenotype ID or if one of the grouping options is provided, then phenotype group ID |
2 | phe_chr | The phenotype chromosome |
3 | phe_from | Start position of the phenotype |
4 | phe_to | End position of the phenotype |
5 | phe_strd | The phenotype strand |
5.1 | phe_id | ve_by_pc1 | n_phe_in_grp | Only printed if --group-best | --group-pca1 | --group-mean. The phenotype ID, variance explained by PC1, or number of phenotypes in the phenotype group for --group-best, --group-pca1, and --group-mean, respectively. |
5.2 | n_phe_in_grp | Only printed if --group-pca1 | --group-mean. The number of phenotypes in the phenotype group. |
6 | n_var_in_cis | The number variants in the cis window for this phenotype. |
7 | dist_phe_var | The distance between the variant and the phenotype start positions. |
8 | var_id | The most significant variant ID. |
9 | var_chr | The most significant variant's chromosome. |
10 | var_from | The start position of the most significant variant. |
11 | var_to | The end position of the most significant variant. |
12 | rank | The rank of the association. This tells you if the variant has been mapped as belonging to the best signal (rank=0), the second best (rank=1), etc ... As a consequence, the maximum rank value for a given phenotype tells you how many independent signals there are (e.g. rank=2 means 3 independent signals). |
13 | fwd_pval | The nominal forward p-value of the association between the most significant variant and the phenotype. |
14 | fwd_r_squared | The r squared of the forward linear regression. |
15 | fwd_slope | The beta (slope) of the forward linear regression. |
15.1 | fwd_slope_se | The standard error of the forward beta. Only printed if --std-err is provided. |
16 | fwd_best_hit | Whether or not this variant was the forward most significant variant. |
17 | fwd_sig | Whether this variant was significant. Currently all variants are significant so this is redundant. |
18 | bwd_pval | The nominal backward p-value of the association between the most significant variant and the phenotype. |
19 | bwd_r_squared | The r squared of the backward linear regression. |
20 | bwd_slope | The beta (slope) of the backward linear regression. |
20.1 | bwd_slope_se | The standard error of the backward beta. Only printed if --std-err is provided. |
21 | bwd_best_hit | Whether or not this variant was the backward most significant variant. |
22 | bwd_sig | Whether this variant was significant. Currently all variants are significant so this is redundant. |
echo "QTLtools cis --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz
--nominal 0.01 --chunk $j 20 --normal --out nominals_$j\_20.txt" |
qsub
done
echo "QTLtools cis --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz
--permute 5000 --chunk $j 20 --normal --out
permutations_$j\_20.txt" | qsub
done
Conditional analysis to discover independent signals.
Rscript ./script/qtltools_runFDR_cis.R permutations_all.txt.gz 0.05 permutations_all
echo "QTLtools cis --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz
--mapping permutations_all.thresholds.txt --chunk $j 20 --normal --out
conditional_$j\_20.txt" | qsub
done
cat conditional_*.txt > conditional_all.txt
QTLtools website: <https://qtltools.github.io/qtltools>
Please submit bugs to <https://github.com/qtltools/qtltools>
Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery and analysis. Nat Commun 8, 15452 (2017). <https://doi.org/10.1038/ncomms15452>
Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)
06 May 2020 | QTLtools-v1.3 |