QTLtools rtc-union - Find the union of QTLs from independent
datasets
QTLtools rtc-union --vcf
[in.vcf|in.vcf.gz|in.bcf|in.bed.gz] ...
--bed quantifications.bed.gz ... --hotspots
hotspots_b37_hg19.bed --results qtl_results_files.txt
... [OPTIONS]
This mode finds the best molQTL (may or may not be genome-wide
significant) in each region flanked by recombination hotspots (coldspot), if
there was a molQTL in the same coldspot in one dataset. First we map all the
significant molQTLs in all of the datasets to coldspots. Subsequently if
certain datasets do not have a significant molQTL in a given coldspot for a
given phenotype, we then take the most significant variant associated with
that phenotype in that coldspot, for all the missing datasets.
- --vcf
[in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
...
- Genotypes in VCF/BCF format, or another molecular phenotype in BED format.
If there is a DS field in the genotype FORMAT of a variant (dosage of the
genotype calculated from genotype probabilities, e.g. after imputation),
then this is used as the genotype. If there is only the GT field in the
genotype FORMAT then this is used and it is converted to a dosage. If a
single file is provided then all datasets are assumed to have the same
genotypes, and all datasets' samples are all included in this file. If
multiple files are provided for each dataset, then all --vcf, --bed,
--cov, and --results files MUST be in the same order. E.g if the first
vcf file is from dataset1, then the first bed, cov, and results files must
also be from dataset1. REQUIRED.
- --bed
quantifications.bed.gz ...
- Molecular phenotype quantifications in BED format for each of the
datasets. All --vcf, --bed, --cov, and --results files MUST be in the
same order. E.g if the first vcf file is from dataset1, then the first
bed, cov, and results files must also be from dataset1. REQUIRED.
- --results
significant_qtls.txt ...
- Results file with the QTLs in each of the datasets. All --vcf, --bed,
--cov, and --results files MUST be in the same order. E.g if the first
vcf file is from dataset1, then the first bed, cov, and results files must
also be from dataset1. REQUIRED.
- --hotspots
recombination_hotspots.bed
- Recombination hotspots in BED format. REQUIRED.
- --out-suffix
suffix
- If provided output files will be suffixed with this.
- --cov
covariates.txt
- Covariates to correct the phenotype data with for each of the datasets.
All --vcf, --bed, --cov, and --results files MUST be in the same
order. E.g if the first vcf file is from dataset1, then the first bed,
cov, and results files must also be from dataset1.
- --force
- If the output file exists, overwrite it.
- --normal
- Rank normal transform the phenotype data so that each phenotype is
normally distributed. RECOMMENDED.
- --conditional
- molQTLs contain independent signals so execute the conditional
analysis.
- --window
integer
- Size of the cis window flanking each phenotype's start position.
DEFAULT=1000000. RECOMMENDED=1000000.
- --pheno-col
integer
- 1-based phenotype id column number. DEFAULT=1
- --geno-col
integer
- 1-based genotype id column number. DEFAULT=8
- --rank-col
integer
- 1-based conditional analysis rank column number. Only relevant if
--conditional is in effect. DEFAULT=12
- --best-col
integer
- 1-based phenotype column number Only relevant if --conditional is
in effect. DEFAULT=21
- --chunk integer1
integer2
- For parallelization. Divide the data into integer2 number of chunks
and process chunk number integer1. Chunk 0 will print a header.
Mutually exclusive with --region. Minimum number of chunks has to be at
least the same number of chromosomes in the --bed file.
- --region
chr:start-end
- Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5.
Mutually exclusive with --chunk.
- output
file
- Space separated output file with the following columns.
1 |
Column showing that this is a rtc-union result. Always __UNION__
|
2 |
The phenotype ID |
3 |
The genotype ID. This can say __UNION_FILLER_MAX_INDEP__,
__UNION_FILLER_MISS_GENO__, or __UNION_FILLER_MISS_PHENO__ which are
fillers for missing cases in one of the datasets. |
4 |
The rank of the best variant in this coldspot. If this was
discovered in the rtc-union run then this would be -1, and if there
was already a significant variant in this coldspot then a different
value. |
5 |
Dummy field indicating that this is the best hit per rank |
6 |
The p-value of the association. Will be 0 if this was already
significant in the dataset |
7 |
The coldspot ID |
8 |
The coldspot region |
- o
- Find the union of 3 datasets, correcting for technical covariates, and
rank normal transforming the phenotypes with 20 jobs on a compute cluster
(qsub needs to be changed to the job submission system used [bsub, psub,
etc...]):
-
- for j in $(seq 1 20); do
echo "QTLtools rtc-union --bed dataset1.bed.gz dataset2.bed.gz
dataset3.bed.gz --vcf dataset1.bcf dataset2.bcf dataset3.bcf --cov
dataset1.covariates.txt dataset2.covariates.txt dataset3.covariates.txt
--results dataset1.txt dataset2.txt dataset3.txt --hotspots
hotspots_b37_hg19.bed --normal --conditional --chunk $j 20 --out-suffix
.chunk.$j.20.txt" | qsub
done
QTLtools(1)
QTLtools website: <https://qtltools.github.io/qtltools>
Versions up to and including 1.2, suffer from a bug in reading
missing genotypes in VCF/BCF files. This bug affects variants with a DS
field in their genotype's FORMAT and have a missing genotype (DS field is .)
in one of the samples, in which case genotypes for all the samples are set
to missing, effectively removing this variant from the analyses.
Please submit bugs to
<https://github.com/qtltools/qtltools>
Ongen H, Brown AA, Delaneau O, et al. Estimating the causal
tissues for complex traits and diseases. Nat Genet.
2017;49(12):1676-1683. doi:10.1038/ng.3981
<https://doi.org/10.1038/ng.3981>
Halit Ongen (halitongen@gmail.com), Olivier Delaneau
(olivier.delaneau@gmail.com)