mason_frag_sequencing - Fragment Sequencing Simulation
mason_frag_sequencing [OPTIONS] -i
IN.fa -o OUT.{fa,fq} [-or
OUT2.{fa,fq}]
Given a FASTA file with fragments, simulate sequencing
thereof.
This program is a more lightweight version of mason_sequencing
without support for the application of VCF and fragment sampling. Output of
SAM is also not available. However, it uses the same code for the simulation
of the reads as the more powerful mason_simulator.
You can use mason_frag_sequencing if you want to implement you
rown fragmentation behaviour, e.g. if you have implemented your own bias
models.
- -h, --help
- Display the help message.
- --version
- Display version information.
- -q, --quiet
- Low verbosity.
- -v, --verbose
- Higher verbosity.
- -vv,
--very-verbose
- Highest verbosity.
- --seed
INTEGER
- Seed to use for random number generator. Default: 0.
- -i, --in
INPUT_FILE
- Path to input file. Valid filetypes are: .sam[.*], .raw[.*],
.gbk[.*], .frn[.*], .fq[.*], .fna[.*],
.ffn[.*], .fastq[.*], .fasta[.*], .faa[.*],
.fa[.*], .embl[.*], and .bam, where * is any of the
following extensions: gz, bz2, and bgzf for
transparent (de)compression.
- -o, --out
OUTPUT_FILE
- Output of single-end/left end reads. Valid filetypes are: .sam[.*],
.raw[.*], .frn[.*], .fq[.*], .fna[.*],
.ffn[.*], .fastq[.*], .fasta[.*], .faa[.*],
.fa[.*], and .bam, where * is any of the following
extensions: gz, bz2, and bgzf for transparent
(de)compression.
- -or, --out-right
OUTPUT_FILE
- Output of right reads. Giving this options enables paired-end simulation.
Valid filetypes are: .sam[.*], .raw[.*], .frn[.*],
.fq[.*], .fna[.*], .ffn[.*], .fastq[.*],
.fasta[.*], .faa[.*], .fa[.*], and .bam, where
* is any of the following extensions: gz, bz2, and
bgzf for transparent (de)compression.
- --force-single-end
- Force single-end simulation although --out-right is given.
- --illumina-read-length
INTEGER
- Read length for Illumina simulation. In range [1..inf]. Default:
100.
- --illumina-error-profile-file
INPUT_FILE
- Path to file with Illumina error profile. The file must be a text file
with floating point numbers separated by space, each giving a positional
error rate. Valid filetype is: .txt.
- --illumina-prob-insert
DOUBLE
- Insert per-base probability for insertion in Illumina sequencing. In range
[0..1]. Default: 0.00005.
- --illumina-prob-deletion
DOUBLE
- Insert per-base probability for deletion in Illumina sequencing. In range
[0..1]. Default: 0.00005.
- --illumina-prob-mismatch-scale
DOUBLE
- Scaling factor for Illumina mismatch probability. In range [0..inf].
Default: 1.0.
- --illumina-prob-mismatch
DOUBLE
- Average per-base mismatch probability in Illumina sequencing. In range
[0.0..1.0]. Default: 0.004.
- --illumina-prob-mismatch-begin
DOUBLE
- Per-base mismatch probability of first base in Illumina sequencing. In
range [0.0..1.0]. Default: 0.002.
- --illumina-prob-mismatch-end
DOUBLE
- Per-base mismatch probability of last base in Illumina sequencing. In
range [0.0..1.0]. Default: 0.012.
- --illumina-position-raise
DOUBLE
- Point where the error curve raises in relation to read length. In range
[0.0..1.0]. Default: 0.66.
- --illumina-quality-mean-begin
DOUBLE
- Mean PHRED quality for non-mismatch bases of first base in Illumina
sequencing. Default: 40.0.
- --illumina-quality-mean-end
DOUBLE
- Mean PHRED quality for non-mismatch bases of last base in Illumina
sequencing. Default: 39.5.
- --illumina-quality-stddev-begin
DOUBLE
- Standard deviation of PHRED quality for non-mismatch bases of first base
in Illumina sequencing. Default: 0.05.
- --illumina-quality-stddev-end
DOUBLE
- Standard deviation of PHRED quality for non-mismatch bases of last base in
Illumina sequencing. Default: 10.0.
- --illumina-mismatch-quality-mean-begin
DOUBLE
- Mean PHRED quality for mismatch bases of first base in Illumina
sequencing. Default: 40.0.
- --illumina-mismatch-quality-mean-end
DOUBLE
- Mean PHRED quality for mismatch bases of last base in Illumina sequencing.
Default: 30.0.
- --illumina-mismatch-quality-stddev-begin
DOUBLE
- Standard deviation of PHRED quality for mismatch bases of first base in
Illumina sequencing. Default: 3.0.
- --illumina-mismatch-quality-stddev-end
DOUBLE
- Standard deviation of PHRED quality for mismatch bases of last base in
Illumina sequencing. Default: 15.0.
- --illumina-left-template-fastq
INPUT_FILE
- FASTQ file to use for a template for left-end reads. Valid filetypes are:
.sam[.*], .raw[.*], .gbk[.*], .frn[.*],
.fq[.*], .fna[.*], .ffn[.*], .fastq[.*],
.fasta[.*], .faa[.*], .fa[.*], .embl[.*], and
.bam, where * is any of the following extensions: gz,
bz2, and bgzf for transparent (de)compression.
- --illumina-right-template-fastq
INPUT_FILE
- FASTQ file to use for a template for right-end reads. Valid filetypes are:
.sam[.*], .raw[.*], .gbk[.*], .frn[.*],
.fq[.*], .fna[.*], .ffn[.*], .fastq[.*],
.fasta[.*], .faa[.*], .fa[.*], .embl[.*], and
.bam, where * is any of the following extensions: gz,
bz2, and bgzf for transparent (de)compression.
- --454-read-length-model STRING
- The model to use for sampling the 454 read length. One of normal
and uniform. Default: normal.
- --454-read-length-min INTEGER
- The minimal read length when the read length is sampled uniformly. In
range [0..inf]. Default: 10.
- --454-read-length-max INTEGER
- The maximal read length when the read length is sampled uniformly. In
range [0..inf]. Default: 600.
- --454-read-length-mean DOUBLE
- The mean read length when the read length is sampled with normal
distribution. In range [0..inf]. Default: 400.
- --454-read-length-stddev DOUBLE
- The read length standard deviation when the read length is sampled with
normal distribution. In range [0..inf]. Default: 40.
- --454-no-sqrt-in-std-dev
- For error model, if set then (sigma = k * r)) is used, otherwise (sigma =
k * sqrt(r)).
- --454-proportionality-factor DOUBLE
- Proportionality factor for calculating the standard deviation proportional
to the read length. In range [0..inf]. Default: 0.15.
- --454-background-noise-mean DOUBLE
- Mean of lognormal distribution to use for the noise. In range [0..inf].
Default: 0.23.
- --454-background-noise-stddev DOUBLE
- Standard deviation of lognormal distribution to use for the noise. In
range [0..inf]. Default: 0.15.
Simulation of base qualities is disabled when writing out FASTA
files. Simulation of paired-end sequencing is enabled when specifying two
output files.
You can use the --mate-orientation to set the relative
orientation when doing paired-end sequencing. The valid values are given in
the following.
- FR
- Reads are inward-facing, the same as Illumina paired-end reads: R1 -->
<-- R2.
- RF
- Reads are outward-facing, the same as Illumina mate-pair reads: R1 <--
--> R2.
- FF
- Reads are on the same strand: R1 --> --> R2.
- FF2
- Reads are on the same strand but the "right" reads are sequenced
to the left of the "left" reads, same as 454 paired: R2 -->
--> R1.