QTLtools rtc - Regulatory Trait Concordance score analysis
QTLtools rtc --vcf
[in.vcf|in.vcf.gz|in.bcf|in.bed.gz] --bed
quantifications.bed.gz --hotspots hotspots_b37_hg19.bed
[--gwas-cis | --gwas-trans | --mergeQTL-cis | --mergeQTL-trans]
variants_external.txt qtls_in_this_dataset.txt --out
output.txt [OPTIONS]
The RTC algorithm assesses the likelihood of a shared functional
effect between a GWAS variant and an molQTL by quantifying the change in the
statistical significance of the molQTL after correcting the molQTL phenotype
for the genetic effect of the GWAS variant and comparing its correction
impact to that of all other SNPs in the interval. The method is detailed in
<https://www.nature.com/articles/ng.3981>. When assessing tissue
specificity of molQTLs we use the same method, however in that case the GWAS
variant becomes an molQTL in a different tissue. The RTC method is as
follows: for a GWAS variant falling into the same region flanked by
recombination hotspots (coldspot) with an molQTL, with N number of variants
in a given coldspot:
- 1
- Correct the phenotype for each of the variants in the region separately by
linear regression, resulting in N number of pseudo-phenotypes
(residuals).
- 2
- Redo the molQTL variant association with all of these
pseudo-phenotypes.
- 3
- Sort (decreasing) the resulting p-values and find the rank of the molQTL
to GWAS-pseudo-phenotype among all molQTL to pseudo-phenotype
associations.
- 4
- RTC = (N - Rank of GWAS) / N
This results in the RTC score which ranges from 0 to 1 where
higher values indicate a more likely shared functional effect between the
GWAS and the molQTL variants. An RTC score greater than or equal to 0.9 is
considered a shared functional effect. If there are multiple independent
molQTLs for a given phenotype, RTC for each independent molQTL is assessed
after correcting the phenotype with all the other molQTL variants for that
phenotype. This correction is done using linear regression and taking the
residuals after regressing the phenotype with the other molQTLs.
In order to convert RTC score into a probability of sharing, we
employ two simulations per coldspot region, H0 and H1. The H0 scenario is
when two variants in a coldspot are tagging different functional effects.
For a coldspot that harbours colocalized GWAS and molQTL variants, we pick
two random hidden causal variants. We then find two variants (GWAS and
molQTL) that are linked (default r-squared ≥ 0.5) to the hidden
causal variants. We generate a pseudo phenotype for molQTL based on the
slope and intercept of the observed molQTL and randomly distributed
residuals of the observed molQTL. Subsequently we rerun the RTC analysis
with this new pseudo-phenotype and using the GWAS and molQTL variants.
The H1 scenario is when the two variants are tagging the same
functional variant. The scheme here is exactly the same as the H0 scheme,
except there is only one hidden causal variant and both GWAS and molQTL
variants are randomly selected from variants that are linked to the same
hidden causal variant.
- --vcf
[in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
- Genotypes in VCF/BCF format, or another molecular phenotype in BED format.
If there is a DS field in the genotype FORMAT of a variant (dosage of the
genotype calculated from genotype probabilities, e.g. after imputation),
then this is used as the genotype. If there is only the GT field in the
genotype FORMAT then this is used and it is converted to a dosage.
REQUIRED.
- --bed
quantifications.bed.gz
- Molecular phenotype quantifications in BED format. REQUIRED.
- --out
output.txt
- Output file. REQUIRED.
- --hotspots
recombination_hotspots.bed
- Recombination hotspots in BED format. REQUIRED.
- --cov
covariates.txt
- Covariates to correct the phenotype data with.
- --stats-vcf
[in.vcf|in.bcf|in.vcf.gz]
- Calculate D' and r-squared from this file. Defaults to the --vcf file.
Needs to have phased genotypes for D' calculations.
- --stats-vcf-include-samples
samples.txt
- Samples to include from the --stats-vcf file. One sample ID per line.
- --stats-vcf-exclude-samples
samples.txt
- Samples to exclude from the --stats-vcf file. One sample ID per line.
- --normal
- Rank normal transform the phenotype data so that each phenotype is
normally distributed. RECOMMENDED.
- --conditional
- molQTLs contain independent signals so execute the conditional
analysis.
- --debug
- Print out debugging info to stderr. DON'T USE.
- --warnings
- Print all encountered individual warnings to stdout.
- Add a header to the output file when --chunk or --region is provided.
- --individual-Dprime
- Calculate D' on an individual variant basis. If not provided D' will not
be calculated after first unphased genotype is encountered.
- --mem-est
- Estimate memory usage and exit.
- --mem
[0|1|2|3]
- Keep results of calculations that may be used multiple times in memory. 0
= nothing in mem, 1 = only basic, 2 = all in mem but clean after unlikely
to be reused, 3 = all in mem no cleaning. DEFAULT=0. RECOMMENDED=2.
- --window
integer
- Size of the cis window flanking each phenotype's start position.
DEFAULT=1000000. RECOMMENDED=1000000.
- --sample
integer
- Number of simulated RTC values to try to achieve for each coldspot, for
converting RTC to a probability. At each iteration we try to pick a unique
combination of variants, thus the actual number of sample iterations may
be less than this value, due to the number variants in a region. If you
want to run this analysis, please provide at least 100. DEFAULT=0.
- --max-sample
integer
- Max number of sample iterations trying to reach --sample before quitting.
Provide the actual number not the multiplier. DEFAULT=--sample * 50.
- --R2-threshold
float
- The minimum r-squared required when picking a variant that is linked to
the hidden causal variant(s) when running simulations using --sample.
DEFAULT=0.5.
- --D-prime-threshold
float
- If the pairs of variants fall into different coldspots and have a D'
greater than this, the RTC calculation is extended to multiple coldspot
regions including both variants. Assumes D' can be calculated.
DEFAULT=OFF. NOT RECOMMENDED.
- --grp-best
- Correct for multiple phenotypes within a phenotype group.
- --pheno-col
integer
- 1-based phenotype id column number. DEFAULT=1 or 5 when --grp-best
- --geno-col
integer
- 1-based genotype id column number. DEFAULT=8 or 10 when --grp-best
- --grp-col
integer
- 1-based phenotype group id column number. Only relevant if
--grp-best is in effect. DEFAULT=1
- --rank-col
integer
- 1-based conditional analysis rank column number. Only relevant if
--conditional is in effect. DEFAULT=12 or 14 when --grp-best
- --best-col
integer
- 1-based phenotype column number Only relevant if --conditional is
in effect. DEFAULT=21 or 23 when --grp-best
- --gwas-cis
variants_external.txt qtls_in_this_dataset.txt
- Run RTC for GWAS and cis-molQTL colocalization analysis. Takes two file
names as arguments. The first is the file with GWAS variants of interest
with one variant ID per line. These should match the variants IDs in the
--vcf file. The second is the QTLtools output for the cis run that
was ran using the same --vcf, --bed, and --cov files.
REQUIRED unless (and mutually exclusive with) --gwas-trans,
--mergeQTL-cis, --mergeQTL-trans.
- --gwas-trans
variants_external.txt qtls_in_this_dataset.txt
- Run RTC for GWAS and trans-molQTL colocalization analysis. Takes two file
names as arguments. The first is the file with GWAS variants of interest
with one variant ID per line. These should match the variants IDs in the
--vcf file. The second is the QTLtools output for the trans run
that was ran using the same --vcf, --bed, and --cov
files. You will need to adjust *-col options. REQUIRED unless (and
mutually exclusive with) --gwas-cis, --mergeQTL-cis,
--mergeQTL-trans.
- --mergeQTL-cis
variants_external.txt qtls_in_this_dataset.txt
- Run RTC for cis-molQTL and cis-molQTL colocalization analysis. Takes two
file names as arguments. The first is the file with cis-molQTL variants of
interest discovered in a different dataset, e.g. different tissue, with
one variant ID per line. These should match the variants IDs in the
--vcf file. The second is the QTLtools output for the cis run that
was ran using the same --vcf, --bed, and --cov files.
REQUIRED unless (and mutually exclusive with) --gwas-trans,
--mergeQTL-cis, --mergeQTL-trans.
- --mergeQTL-trans
variants_external.txt qtls_in_this_dataset.txt
- Run RTC for trans-molQTL and trans-molQTL colocalization analysis. Takes
two file names as arguments. The first is the file with trans-molQTL
variants of interest discovered in a different dataset, e.g. different
tissue, with one variant ID per line. These should match the variants IDs
in the --vcf file. The second is the QTLtools output for the trans
run that was ran using the same --vcf, --bed, and
--cov files. You will need to adjust *-col options. REQUIRED
unless (and mutually exclusive with) --gwas-trans,
--gwas-cis, --mergeQTL-cis.
- --chunk integer1
integer2
- For parallelization. Divide the data into integer2 number of chunks
and process chunk number integer1. Chunk 0 will print a header.
Mutually exclusive with --region. Minimum number of chunks has to be at
least the same number of chromosomes in the --bed file.
- --region
chr:start-end
- Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5.
Mutually exclusive with --chunk.
- --out output
file
- Space separated output file with the following columns. Columns after the
22nd are only printed if --sample is provided. We recommend
including chunk 0 to print out a header in order to avoid confusion.
1 |
other_variant |
The variant ID that is external to this dataset, could be the GWAS
variant or another molQTL |
2 |
our_variant |
The molQTL variant ID that is internal to this dataset |
3 |
phenotype |
The phenotype ID |
4 |
phenotype_group |
The phenotype group ID |
5 |
other_variant_chr |
The external variant's chromosome |
6 |
other_variant_start |
The external variant's start position |
7 |
other_variant_rank |
Rank of the external variant. Only relevant if the external variants
are part of an conditional analysis |
8 |
our_variant_chr |
The internal variant's chromosome |
9 |
our_variant_start |
The internal variant's start position |
10 |
our_variant_rank |
Rank of the internal variant. Only relevant if the internal variants
are part of an conditional analysis |
11 |
phenotype_chr |
The phenotype's chromosome |
12 |
phenotype_start |
The start position of the phenotype |
13 |
distance_between_variants |
The distance between the two variants |
14 |
distance_between_other_variant_and_pheno |
The distance between the external variant and the phenotype |
15 |
other_variant_region_index |
The region index of the external variant |
16 |
our_variant_region_index |
The region index of the internal variant |
17 |
region_start |
The start position of the region |
18 |
region_end |
The end position of the region |
19 |
variant_count_in_region |
The number of variants in the region |
20 |
RTC |
The RTC score |
21 |
D' |
The D' of the two variants. Only calculated if there are phased
genotypes |
22 |
r^2 |
The r squared of the two variants |
22 |
p_value |
The p-value of the RTC score |
23 |
unique_picks_H0 |
The number of unique combinations of variants in the H0
simulations |
24 |
unique_picks_H1 |
The number of unique combinations of variants in the H1
simulations |
25 |
rtc_bin_start |
Lower bound of the RTC bin, based on the observed RTC score |
26 |
rtc_bin_end |
Upper bound of the RTC bin, based on the observed RTC score |
27 |
rtc_bin_H0_proportion |
The proportion of H0 simulated values that are between rtc_bin_start
and rtc_bin_end |
28 |
rtc_bin_H1_proportion |
The proportion of H1 simulated values that are between rtc_bin_start
and rtc_bin_end |
29 |
median_r^2 |
The median r-squared in the region |
30 |
median_H0 |
The median RTC score in the H0 simulations |
31 |
median_H1 |
The median RTC score in the H1 simulations |
32 |
H0 |
The RTC scores observed in the H0 simulations |
33 |
H1 |
The RTC scores observed in the H1 simulations |
- o
- Run RTC with GWAS variants and cis-eQTLs correcting for technical
covariates and rank normal transforming the phenotype:
-
- QTLtools rtc --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --hotspot
hotspots_b37_hg19.bed --gwas-cis GWAS.b37.txt
permutations_all.significant.txt --normal --out rtc_results.txt
- o
- RTC with GWAS variants and cis-eQTLs and simulations, correcting for
technical covariates, rank normal transforming the phenotype, and running
conditional analysis while keeping data in memory. To facilitate
parallelization on compute cluster, we developed an option to run the
analysis into chunks of molecular phenotypes. For instance, to run
analysis on chunk 12 when splitting the example data set into 20 chunks,
run:
-
- QTLtools rtc --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz --hotspot
hotspots_b37_hg19.bed --gwas-cis GWAS.b37.txt
conditional_all.significant.txt --normal --conditional --mem 2 --chunk 12
20 --sample 200 --out rtc_results_12_20.txt
- o
- If you want to submit the whole analysis with 20 jobs on a compute
cluster, just run (qsub needs to be changed to the job submission system
used [bsub, psub, etc...]):
-
- for j in $(seq 0 20); do
echo "QTLtools rtc --vcf genotypes.chr22.vcf.gz --bed
genes.50percent.chr22.bed.gz --cov genes.covariates.pc50.txt.gz
--hotspot hotspots_b37_hg19.bed --gwas-cis GWAS.b37.txt
conditional_all.significant.txt --normal --conditional --mem 2 --chunk
$j 20 --sample 200 --out rtc_results_$j_20.txt" | qsub
done
QTLtools(1)
QTLtools website: <https://qtltools.github.io/qtltools>
Please submit bugs to
<https://github.com/qtltools/qtltools>
Ongen H, Brown AA, Delaneau O, et al. Estimating the causal
tissues for complex traits and diseases. Nat Genet.
2017;49(12):1676-1683. doi:10.1038/ng.3981
<https://doi.org/10.1038/ng.3981>
Halit Ongen (halitongen@gmail.com), Olivier Delaneau
(olivier.delaneau@gmail.com)