QTLtools pca - Conducts PCA
QTLtools pca --vcf
[in.vcf|in.vcf.gz|in.bcf] | --bed
in.bed.gz --out output.txt [OPTIONS]
This mode allows performing a Principal Component Analysis (PCA)
either on molecular phenotype quantifications or genotype data. It is
typically used (i) to detect outliers in the data, (ii) to detect
stratification in the data or (iii) to build a covariate matrix before QTL
mapping. QTLtools' PCA implementation utilizes singular value decomposition
(SVD). When building a covariate matrix to account for technical covariates
we recommend using --center and --scale.
- --vcf
[in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
- Genotypes in VCF/BCF/BED format. REQUIRED unless --bed.
- --bed
quantifications.bed.gz
- Quantifications in BED format. REQUIRED unless --vcf.
- --out
output_prefix
- Output file prefix. REQUIRED.
- --center
- Center the variables (genotypes or phenotypes) by subtracting the mean
from each value
- --scale
- Scale the variables (genotypes or phenotypes) by dividing each value by
the standard deviation
- --region
chr:start-end
- Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5
- --exclude-chrs
string
- The chromosomes to exclude given as a space separated list. Only applies
to --vcf. DEFAULT="X Y M MT XY chrX chrY chrM chrMT
chrXY"
- --maf
float
- Exclude sites with minor allele frequency less than this. Only applies to
--vcf. DEFAULT=0.0
- --distance
integer
- Only include sites separated with this many base pairs. Only applies to
--vcf. DEFAULT=0
- .pca
- This file contains the principal components that were calculated. The
names of the principal components, which is given in the first column, is
composed of the output file prefix, whether the data was centered, whether
the data was scaled, and the principal component number.
- .pca_stats
- This file contains the standard deviation of each principal component, and
the variance and the cumulative variance explained by each PC.
- o
- Running pca on RNAseq quantifications to calculate technical
covariates:
-
- QTLtools pca --bed genes.50percent.chr22.bed.gz --out
genes.50percent.chr22 --center --scale
- o
- Running pca on genotypes to detect population stratification:
-
- QTLtools pca --vcf genotypes.chr22.vcf.gz --out genotypes.chr22 --center
--scale --maf 0.05 --distance 5000
QTLtools(1)
QTLtools website: <https://qtltools.github.io/qtltools>
- o
- Versions up to and including 1.2, suffer from a bug in reading missing
genotypes in VCF/BCF files. This bug affects variants with a DS field in
their genotype's FORMAT and have a missing genotype (DS fiels is .) in one
of the samples, in which case genotypes for all the samples are set to
missing, effectively removing this variant from the analyses.
Please submit bugs to
<https://github.com/qtltools/qtltools>
Halit Ongen (halitongen@gmail.com), Olivier Delaneau
(olivier.delaneau@gmail.com)