run_discoSnp++.sh - pipelining kissnp2 and kissreads for calling
SNPs and small indels from NGS reads without the need of a reference
genome
run_discoSnp++.sh -r read_file_of_files
[OPTIONS]
run_discoSnp++.sh, a pipelining kissnp2 and kissreads for calling
SNPs and small indels from NGS reads without the need of a reference genome
Version 2.3.X
- MANDATORY:
- -r
read_file_of_files
- Example: -r bank.fof with bank.fof containing the two lines
- data_sample/reads_sequence1.fasta
- data_sample/reads_sequence2.fasta.gz
- DISCOSNP++ OPTIONS:
- -g: reuse a previously created graph (.h5 file) with same prefix
and same k and c parameters. -b value.
- 0: forbid variants for which any of the two paths is branching (high
precision, lowers the recall in complex genomes). Default value
- 1: (smart branching) forbid SNPs for which the two paths are branching
(e.g. the two paths can be created either with a 'A' or a 'C' at the same
position 2: No limitation on branching (lowers the precision, high
recall)
- -s value. In b2 mode only: maximal number of symmetrical croasroads
traversed while trying to close a bubble. Default: no limit -D
value. discoSnp++ will search for deletions of size from 1 to D included.
Default=100 -a value. Maximal size of ambiguity of INDELs. INDELS
whose ambiguity is higher than this value are not output [default '20']
-P value. discoSnp++ will search up to P SNPs in a unique bubble.
Default=1 -p prefix. All out files will start with this prefix.
Default="discoRes" -l: remove low complexity bubbles
-k value. Set the length of used kmers. Must fit the compiled
value. Default=31 -t: extend found polymorphisms with unitigs -
Forced usage when using discoSnpRad -T: extend found polymorphisms
with contigs -c value. Set the minimal coverage per read set: Used
by kissnp2 (don't use kmers with lower coverage) and kissreads (read
coherency threshold). This coverage can be automatically detected per read
set or specified per read set, see the documentation. Default=auto
-C value. Set the maximal coverage for each read set: Used by
kissnp2 (don't use kmers with higher coverage). Default=2^31-1 -d
value. Set the number of authorized substitutions used while mapping reads
on found SNPs (kissreads). Default=1 -n: do not compute the
genotypes -u: max number of used threads -v: verbose 0
(avoids progress output) or 1 (enables progress output) --
default=1.
- REFERENCE GENOME AND/OR VCF CREATION OPTIONS
- -G: reference genome file (fasta, fastq, gzipped or nor). In
absence of this file the VCF created by VCF_creator won't contain mapping
related results. -R: use the reference file also in the variant
calling, not only for mapping results -B: bwa path. e.g.
/home/me/my_programs/bwa-0.7.12/ (note that bwa must be
pre-compiled)
- Optional unless option -G used and bwa is not in the binary
path.
- -M: Maximal number of mapping
errors during BWA mapping phase.
- Useless unless mapping on reference genome is required (option -G).
Default=4.
- -h: Prints this message and exist -e: map SNP predictions on
reference genome with their extensions. - Forced usage when using
discoSnpRad
Any further question: read the readme file or contact us via the
Biostar forum: https://www.biostars.org/t/discosnp/
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.