vcf - Variant Call Format
The Variant Call Format (VCF) is a TAB-delimited format with each
data line consisting of the following fields:
1 |
CHROM |
CHROMosome name |
2 |
POS |
the left-most POSition of the variant |
3 |
ID |
unique variant IDentifier |
4 |
REF |
the REFerence allele |
5 |
ALT |
the ALTernate allele(s) (comma-separated) |
6 |
QUAL |
variant/reference QUALity |
7 |
FILTER |
FILTERs applied |
8 |
INFO |
INFOrmation related to the variant (semicolon-separated) |
9 |
FORMAT |
FORMAT of the genotype fields (optional; colon-separated) |
10+ |
SAMPLE |
SAMPLE genotypes and per-sample information (optional) |
The following table gives the INFO tags used by samtools
and bcftools.
- AF1
- Max-likelihood estimate of the site allele frequency (AF) of the first ALT
allele (double)
- DP
- Raw read depth (without quality filtering) (int)
- DP4
- # high-quality reference forward bases, ref reverse, alternate for and alt
rev bases (int[4])
- FQ
- Consensus quality. Positive: sample genotypes different; negative:
otherwise (int)
- MQ
- Root-Mean-Square mapping quality of covering reads (int)
- PC2
- Phred probability of AF in group1 samples being larger (,smaller) than in
group2 (int[2])
- PCHI2
- Posterior weighted chi^2 P-value between group1 and group2 samples
(double)
- PV4
- P-value for strand bias, baseQ bias, mapQ bias and tail distance bias
(double[4])
- QCHI2
- Phred-scaled PCHI2 (int)
- RP
- # permutations yielding a smaller PCHI2 (int)
- CLR
- Phred log ratio of genotype likelihoods with and without the trio/pair
constraint (int)
- UGT
- Most probable genotype configuration without the trio constraint
(string)
- CGT
- Most probable configuration with the trio constraint (string)
- VDB
- Tests variant positions within reads. Intended for filtering RNA-seq
artifacts around splice sites (float)
- RPB
- Mann-Whitney rank-sum test for tail distance bias (float)
- HWE
- Hardy-Weinberg equilibrium test (Wigginton et al) (float)