bolt - Efficient large cohorts genome-wide Bayesian mixed-model
association testing
The BOLT-LMM software package currently consists of two main
algorithms, the BOLT-LMM algorithm for mixed model association testing, and
the BOLT-REML algorithm for variance components analysis (i.e., partitioning
of SNP-heritability and estimation of genetic correlations).
The BOLT-LMM algorithm computes statistics for testing association
between phenotype and genotypes using a linear mixed model. By default,
BOLT-LMM assumes a Bayesian mixture-of-normals prior for the random effect
attributed to SNPs other than the one being tested. This model generalizes
the standard infinitesimal mixed model used by previous mixed model
association methods, providing an opportunity for increased power to detect
associations while controlling false positives. Additionally, BOLT-LMM
applies algorithmic advances to compute mixed model association statistics
much faster than eigendecomposition-based methods, both when using the
Bayesian mixture model and when specialized to standard mixed model
association.
The BOLT-REML algorithm estimates heritability explained by
genotyped SNPs and genetic correlations among multiple traits measured on
the same set of individuals. BOLT-REML applies variance components analysis
to perform these tasks, supporting both multi-component modeling to
partition SNP-heritability and multi-trait modeling to estimate
correlations. BOLT-REML applies a Monte Carlo algorithm that is much faster
than eigendecomposition-based methods for variance components analysis at
large sample sizes.
-h [ --help ] print help message with typical
options
- --helpFull
- print help message with full option list
- --bfile arg
- prefix of PLINK .fam, .bim, .bed files
- --bfilegz
arg
- prefix of PLINK .fam.gz, .bim.gz, .bed.gz files
- --fam arg
- PLINK .fam file (note: file names ending in .gz are
auto-[de]compressed)
- --bim arg
- PLINK .bim file(s); for >1, use multiple --bim and/or {i:j},
e.g., data.chr{1:22}.bim
- --bed arg
- PLINK .bed file(s); for >1, use multiple --bim and/or {i:j}
expansion
- --geneticMapFile
arg
- Oxford-format file for interpolating genetic distances:
tables/genetic_map_hg##.txt.gz
- --remove
arg
- file(s) listing individuals to ignore (no header; FID IID must be first
two columns)
- --exclude
arg
- file(s) listing SNPs to ignore (no header; SNP ID must be first
column)
- --maxMissingPerSnp
arg (=0.1)
- QC filter: max missing rate per SNP
--maxMissingPerIndiv arg (=0.1) QC filter: max missing
rate per person
- --phenoFile
arg
- phenotype file (header required; FID IID must be first two columns)
- --phenoCol
arg
- phenotype column header
- --phenoUseFam
- use last (6th) column of .fam file as phenotype
- --covarFile
arg
- covariate file (header required; FID IID must be first two columns)
- --covarCol
arg
- categorical covariate column(s); for >1, use multiple --covarCol
and/or {i:j} expansion
- --qCovarCol
arg
- quantitative covariate column(s); for >1, use multiple
--qCovarCol and/or {i:j} expansion
- --covarUseMissingIndic
- include samples with missing covariates in analysis via missing indicator
method (default: ignore such samples)
- --reml
- run variance components analysis to precisely estimate heritability (but
not compute assoc stats)
- --lmm
- compute assoc stats under the inf model and with Bayesian non-inf prior
(VB approx), if power gain expected
- --lmmInfOnly
- compute mixed model assoc stats under the infinitesimal model
- --lmmForceNonInf
- compute non-inf assoc stats even if BOLT-LMM expects no power gain
- --modelSnps
arg
- file(s) listing SNPs to use in model (i.e., GRM) (default: use all
non-excluded SNPs)
- --LDscoresFile
arg
- LD Scores for calibration of Bayesian assoc stats:
tables/LDSCORE.1000G_EUR.tab.gz
- --numThreads
arg (=1)
- number of computational threads
- --statsFile
arg
- output file for assoc stats at PLINK genotypes
- --dosageFile
arg
- file(s) containing imputed SNP dosages to test for association (see manual
for format)
- --dosageFidIidFile
arg
- file listing FIDs and IIDs of samples in dosageFile(s), one line per
sample
- --statsFileDosageSnps
arg
- output file for assoc stats at dosage format genotypes
- --impute2FileList
arg
- list of [chr file] pairs containing IMPUTE2 SNP probabilities to test for
association
- --impute2FidIidFile
arg
- file listing FIDs and IIDs of samples in IMPUTE2 files, one line per
sample
- --impute2MinMAF
arg (=0)
- MAF threshold on IMPUTE2 genotypes; lower-MAF SNPs will be ignored
- --bgenFile
arg
- file(s) containing Oxford BGEN-format genotypes to test for
association
- --sampleFile
arg
- file containing Oxford sample file corresponding to BGEN file(s)
- --bgenSampleFileList
arg
- list of [bgen sample] file pairs containing BGEN imputed variants to test
for association
- --bgenMinMAF
arg (=0)
- MAF threshold on Oxford BGEN-format genotypes; lower-MAF SNPs will be
ignored
- --bgenMinINFO
arg (=0)
- INFO threshold on Oxford BGEN-format genotypes; lower-INFO SNPs will be
ignored
- --statsFileBgenSnps
arg
- output file for assoc stats at BGEN-format genotypes
- --statsFileImpute2Snps
arg
- output file for assoc stats at IMPUTE2 format genotypes
- --dosage2FileList
arg
- list of [map dosage] file pairs with 2-dosage SNP probabilities
(Ricopili/plink2 --dosage format=2) to test for association
- --statsFileDosage2Snps
arg
- output file for assoc stats at 2-dosage format genotypes
https://data.broadinstitute.org/alkesgroup/BOLT-LMM/
Copyright © 2014-2018 Harvard University. Distributed under
the GNU GPLv3+ open source license.