sniffles - structural variation caller using third-generation
sequencing
usage: sniffles --input SORTED_INPUT.bam [--vcf OUTPUT.vcf]
[--snf MERGEABLE_OUTPUT.snf] [--threads 4] [--non-germline]
Sniffles2: A fast structural variant (SV) caller for long-read
sequencing data
- Version 2.0.2 Contact: moritz.g.smolka@gmail.com
- Usage example A - Call SVs for a single sample:
- sniffles --input sorted_indexed_alignments.bam --vcf
output.vcf
- ... OR, with CRAM input and bgzipped+tabix indexed VCF output:
- sniffles --input sample.cram --vcf output.vcf.gz
- ... OR, producing only a SNF file with SV candidates for later
multi-sample calling:
- sniffles --input sample1.bam --snf sample1.snf
- ... OR, simultaneously producing a single-sample VCF and SNF file for
later multi-sample calling:
- sniffles --input sample1.bam --vcf sample1.vcf.gz
--snf sample1.snf
- ... OR, with additional options to specify tandem repeat annotations (for
improved call accuracy), reference (for DEL sequences) and non-germline
mode for detecting rare SVs:
- sniffles --input sample1.bam --vcf sample1.vcf.gz
--tandem-repeats tandem_repeats.bed --reference genome.fa
--non-germline
- Usage example B - Multi-sample calling:
- Step 1. Create .snf for each sample: sniffles --input sample1.bam
--snf sample1.snf Step 2. Combined calling: sniffles --input
sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf
- ... OR, using a .tsv file containing a list of .snf files, and custom
sample ids in an optional second column (one sample per line): Step 2.
Combined calling: sniffles --input snf_files_list.tsv --vcf
multisample.vcf
- Usage example C - Determine genotypes for a set of known SVs (force
calling):
- sniffles --input sample.bam --genotype-vcf
input_known_svs.vcf --vcf output_genotypes.vcf
- Use --help for full parameter/usage information
- -i IN [IN ...],
--input IN [IN ...]
- For single-sample calling: A coordinate-sorted and indexed .bam/.cram
(BAM/CRAM format) file containing aligned reads. - OR - For multi-sample
calling: Multiple .snf files (generated before by running Sniffles2 for
individual samples with --snf) (default: None)
- -v OUT.vcf, --vcf
OUT.vcf
- VCF output filename to write the called and refined SVs to. If the given
filename ends with .gz, the VCF file will be automatically bgzipped and a
.tbi index built for it. (default: None)
- --snf OUT.snf
- Sniffles2 file (.snf) output filename to store candidates for later
multi-sample calling (default: None)
- --reference
reference.fasta
- (Optional) Reference sequence the reads were aligned against. To enable
output of deletion SV sequences, this parameter must be set. (default:
None)
- --tandem-repeats
IN.bed
- (Optional) Input .bed file containing tandem repeat annotations for the
reference genome. (default: None)
- --non-germline
- Call non-germline SVs (rare, somatic or mosaic SVs) (default: False)
- --phase
- Determine phase for SV calls (requires the input alignments to be phased)
(default: False)
- -t N, --threads
N
- Number of parallel threads to use (speed-up for multi-core CPUs) (default:
4)
- --minsupport
auto
- Minimum number of supporting reads for a SV to be reported (default:
automatically choose based on coverage) (default: auto)
- --minsupport-auto-mult
0.1/0.025
- Coverage based minimum support multiplier for germline/non-germline modes
(only for auto minsupport) (default: None)
- --minsvlen
N
- Minimum SV length (in bp) (default: 35)
- --minsvlen-screen-ratio
N
- Minimum length for SV candidates (as fraction of --minsvlen)
(default: 0.95)
- --mapq N
- Alignments with mapping quality lower than this value will be ignored
(default: 25)
- --no-qc
- Output all SV candidates, disregarding quality control steps. (default:
False)
- --qc-stdev True
- Apply filtering based on SV start position and length standard deviation
(default: True)
- --qc-stdev-abs-max
N
- Maximum standard deviation for SV length and size (in bp) (default:
500)
- --qc-strand
False
- Apply filtering based on strand support of SV calls (default: False)
- --qc-coverage
N
- Minimum surrounding region coverage of SV calls (default: 1)
- --long-ins-length
2500
- Insertion SVs longer than this value are considered as hard to detect
based on the aligner and read length and subjected to more sensitive
filtering. (default: 2500)
- --long-del-length
50000
- Deletion SVs longer than this value are subjected to central coverage
drop-based filtering (Not applicable for --non-germline) (default:
50000)
- --long-del-coverage
0.66
- Long deletions with central coverage (in relation to upstream/downstream
coverage) higher than this value will be filtered (Not applicable for
--non-germline) (default: 0.66)
- --long-dup-length
50000
- Duplication SVs longer than this value are subjected to central coverage
increase-based filtering (Not applicable for --non-germline)
(default: 50000)
- --long-dup-coverage
1.33
- Long duplications with central coverage (in relation to
upstream/downstream coverage) lower than this value will be filtered (Not
applicable for --non-germline) (default: 1.33)
- --max-splits-kb
N
- Additional number of splits per kilobase read sequence allowed before
reads are ignored (default: 0.1)
- --max-splits-base
N
- Base number of splits allowed before reads are ignored (in addition to
--max-splits-kb) (default: 3)
- --min-alignment-length
N
- Reads with alignments shorter than this length (in bp) will be ignored
(default: 1000)
- --phase-conflict-threshold
F
- Maximum fraction of conflicting reads permitted for SV phase information
to be labelled as PASS (only for --phase) (default: 0.1)
- --detect-large-ins
True
- Infer insertions that are longer than most reads and therefore are spanned
by few alignments only. (default: True)
- --cluster-binsize
N
- Initial screening bin size in bp (default: 100)
- --cluster-r
R
- Multiplier for SV start position standard deviation criterion in cluster
merging (default: 2.5)
- --cluster-repeat-h
H
- Multiplier for mean SV length criterion for tandem repeat cluster merging
(default: 1.5)
- --cluster-repeat-h-max
N
- Max. merging distance based on SV length criterion for tandem repeat
cluster merging (default: 1000)
- --cluster-merge-pos
N
- Max. merging distance for insertions and deletions on the same read and
cluster in non-repeat regions (default: 150)
- --cluster-merge-len
F
- Max. size difference for merging SVs as fraction of SV length (default:
0.33)
- --cluster-merge-bnd
N
- Max. merging distance for breakend SV candidates. (default: 1500)
- --genotype-ploidy
N
- Sample ploidy (currently fixed at value 2) (default: 2)
- --genotype-error
N
- Estimated false positve rate for leads (relating to total coverage)
(default: 0.05)
- --sample-id
SAMPLE_ID
- Custom ID for this sample, used for later multi-sample calling (stored in
.snf) (default: None)
- --genotype-vcf
IN.vcf
- Determine the genotypes for all SVs in the given input .vcf file (forced
calling). Re-genotyped .vcf will be written to the output file specified
with --vcf. (default: None)
- --combine-high-confidence
F
- Minimum fraction of samples in which a SV needs to have individually
passed QC for it to be reported in combined output (a value of zero will
report all SVs that pass QC in at least one of the input samples)
(default: 0.0)
- --combine-low-confidence
F
- Minimum fraction of samples in which a SV needs to be present (failed QC)
for it to be reported in combined output (default: 0.2)
- --combine-low-confidence-abs
N
- Minimum absolute number of samples in which a SV needs to be present
(failed QC) for it to be reported in combined output (default: 3)
- --combine-null-min-coverage
N
- Minimum coverage for a sample genotype to be reported as 0/0 (sample
genotypes with coverage below this threshold at the SV location will be
output as ./.) (default: 5)
- --combine-match
N
- Maximum deviation of multiple SV's start/end position for them to be
combined across samples. Given by
max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter.
(default: 500)
- --combine-consensus
- Output the consensus genotype of all samples (default: False)
- --combine-separate-intra
- Disable combination of SVs within the same sample (default: False)
- --combine-output-filtered
- Include low-confidence / putative non-germline SVs in multi-calling
(default: False)
- --output-rnames
- Output names of all supporting reads for each SV in the RNAMEs info field
(default: False)
- --no-consensus
- Disable consensus sequence generation for insertion SV calls (may improve
performance) (default: False)
- --no-sort
- Do not sort output VCF by genomic coordinates (may slightly improve
performance) (default: False)
- --no-progress
- Disable progress display (default: False)
- --quiet
- Disable all logging, except errors (default: False)
- --max-del-seq-len
N
- Maximum deletion sequence length to be output. Deletion SVs longer than
this value will be written to the output as symbolic SVs. (default:
50000)
- --symbolic
- Output all SVs as symbolic, including insertions and deletions, instead of
reporting nucleotide sequences. (default: False)
- Usage example A - Call SVs for a single sample:
- sniffles --input sorted_indexed_alignments.bam --vcf
output.vcf
- ... OR, with CRAM input and bgzipped+tabix indexed VCF output:
- sniffles --input sample.cram --vcf output.vcf.gz
- ... OR, producing only a SNF file with SV candidates for later
multi-sample calling:
- sniffles --input sample1.bam --snf sample1.snf
- ... OR, simultaneously producing a single-sample VCF and SNF file for
later multi-sample calling:
- sniffles --input sample1.bam --vcf sample1.vcf.gz
--snf sample1.snf
- ... OR, with additional options to specify tandem repeat annotations (for
improved call accuracy), reference (for DEL sequences) and non-germline
mode for detecting rare SVs:
- sniffles --input sample1.bam --vcf sample1.vcf.gz
--tandem-repeats tandem_repeats.bed --reference genome.fa
--non-germline
- Usage example B - Multi-sample calling:
- Step 1. Create .snf for each sample: sniffles --input sample1.bam
--snf sample1.snf Step 2. Combined calling: sniffles --input
sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf
- ... OR, using a .tsv file containing a list of .snf files, and custom
sample ids in an optional second column (one sample per line): Step 2.
Combined calling: sniffles --input snf_files_list.tsv --vcf
multisample.vcf
- Usage example C - Determine genotypes for a set of known SVs (force
calling):
- sniffles --input sample.bam --genotype-vcf
input_known_svs.vcf --vcf output_genotypes.vcf
This manpage was written by Andreas Tille for the Debian
distribution and
can be used for any other usage of the program.