miraconvert - convert assembly and sequencing file types
miraconvert [-f <fromtype>] [-t <totype> [-t
<totype> ...]] [-aAbCdhimMsuZ] [-cflnNoPqrtvxXyYz {...}] {infile}
{outfile} [<totype> <totype> ...]
- -f
<fromtype>
- load this type of project files, where fromtype is:
- caf a complete assembly or single sequences from CAF
- maf a complete assembly or single sequences from CAF
- fasta sequences from a FASTA file
- fastq sequences from a FASTQ file
- gb[f|k|ff] sequences from a GenBank file
- phd sequences from a PHD file
- fofnexp sequences in EXP files from file of filenames
- -t
<totype>
- write the sequences/assembly to this type (multiple mentions of -t
are allowed):
- ace sequences or complete assembly to ACE
- caf sequences or complete assembly to CAF
- maf sequences or complete assembly to MAF
- sam complete assembly to SAM
- samnbb like above, but leaving out reference (backbones) in mapping
assemblies
- gb[f|k|ff] sequences or consensus to GenBank
- gff3 consensus to GFF3
- wig assembly coverage info to wiggle file
- gcwig assembly gc content info to wiggle file
- fasta sequences or consensus to FASTA file (qualities to
.qual)
- fastq sequences or consensus to FASTQ file
- exp sequences or complete assembly to EXP files in directories.
Complete assemblies are suited for gap4 import as directed assembly. Note:
using caf2gap to import into gap4 is recommended though
- text complete assembly to text alignment (only when -f is
caf, maf or gbf)
- html complete assembly to HTML (only when -f is caf,
maf or gbf)
- tcs complete assembly to tcs
- hsnp surrounding of SNP tags (SROc, SAOc, SIOc) to HTML (only when
-f is caf, maf or gbf)
- asnp analysis of SNP tags (only when -f is caf,
maf or gbf)
- cstats contig statistics file like from MIRA (only when source
contains contigs)
- crlist contig read list file like from MIRA (only when source
contains contigs)
- maskedfasta reads where sequencing vector is masked out (with X) to
FASTA file (qualities to .qual)
- scaf sequences or complete assembly to single sequences CAF
- -a
- Append to target files instead of rewriting
- -A
- Do not Adjust sequence case
- When reading formats which define clipping points, and saving to formats
which do not have clipping information, miraconvert normally adjusts the
case of read sequences: lower case for clipped parts, upper case for
unclipped parts of reads. Use -A if you do not want this. See also
-C.
- Applies only to files/formats which do not contain contigs.
- -b
- Blind data
- Replaces all bases in reads/contigs with a 'c'
- -C
- Perform hard clip to reads
- When reading formats which define clipping points, will save only the
unclipped part into the result file.
- Applies only to files/formats which do not contain contigs.
- -d
- Delete gap only columns
- When output is contigs: delete columns that are entirely gaps (like after
having deleted reads during editing in gap4 or similar)
- When output is reads: delete gaps in reads
- -F
- Filter read groups to different files
- Works only for input files with readgroups (CAF/MAF) 3 (or 4) files
generated: one or two for paired, one for unpaired and one for debris
reads.
- Reads in paired file are interlaced by default, use -F twice to
create separate files.
- -m
- Make contigs (only for -t = caf or maf)
- Encase single reads as contig singlets into the CAF/MAF file.
- -n <filename>
- when given, selects only reads or contigs given by name in that file.
- -N <filename>
- like -n, but sorts output according to order given in file.
- -i
- when -n is used, inverts the selection
- -o <quality>t
- FASTQ quality Offset (only for -f = 'fastq')
- Offset of quality values in FASTQ file. Default of 33 loads Sanger/Phred
style files, using 0 tries to automatically recognise.
- -P <string>
- String with MIRA parameters to be parsed
- Useful when setting parameters affecting consensus calling like
-CO:mrpg etc.
- E.g.: -P "454_SETTINGS -CO:mrpg=3"
- -q <quality>
- Set default quality for bases in file types without quality values.
Furthermore, do not stop if expected quality files are missing (e.g.
'.fasta')
- -R <name>
- Rename contigs/singlets/reads with given name string to which a counter is
appended.
- Known bug: will create duplicate names if input contains contigs/singlets
as well as free reads, i.e. reads not in contigs nor singlets.
- -S <name>
- (name)Scheme for renaming reads, important for paired-ends. Only 'solexa'
is currently supported.
- -T
- When converting single reads, trim/clip away stretches of N and X and ends
of reads. Note: remember to use -C to also perform a hard clip
(e.g. with FASTA as output).
- -v
- Print version number and exit
- -Y <integer>
- Yield. Max (clipped/padded) bases to convert.
- When used on reads: output will contain first reads of file where length
of clipped bases totals at least -Y. When used on contigs: output
will contain first contigs of file where length of padded contigs totals
at least -Y.
The following switches work only when input (CAF or MAF) contains
contigs. Beware: CAF and MAf can also contain just reads.
- -M
- Do not extract contigs (or their consensus), but the sequence of the reads
they are composed of.
- -r [cCqf]
- Recalculate consensus and / or consensus quality values and / or SNP
feature tags.
- 'c' recalc cons & cons qualities (with IUPAC)
- 'C' recalc cons & cons qualities (forcing non-IUPAC)
- 'q' recalc consensus qualities only
- 'f' recalc SNP features
- Note: only the last of cCq is relevant, f works as a switch
and can be combined with cQq (e.g. "-r C
-r f")
- Note: if the CAF/MAF contains multiple strains, recalculation of cons
& cons qualities is forced, you can just influence whether IUPACs are
used or not.
- -s
- split output into multiple files instead of creating a single file
- -u
- 'fillUp strain genomes'
- Fill holes in the genome of one strain (N or @) with sequence from a
consensus of other strains
- Takes effect only with -r and -t gbf or fasta/q in FASTA/Q:
bases filled up are in lower case in GBF: bases filled up are in upper
case
- -Q <integer>
- Defines minimum quality a consensus base of a strain must have, consensus
bases below this will be 'N' Default: 0
- Only used with -r, and -f is caf/maf and -t is (fasta
or gbf)
- -V <integer>
- Defines minimum coverage a consensus base of a strain must have, bases
with coverage below this will be 'N' Default: 0
- Only used with -r, and -t is (fasta or gbf)
- -x <integer>
- Minimum contig or unclipped read length
- When loading, discard all contigs / reads with a length less than this
value. Default: 0 (=switched off)
- Note: not applied to reads in contigs!
- -X <integer>
- Similar to -x but applies only to reads and then to the clipped
length.
- -y <integer>
- Minimum average contig coverage When loading, discard all contigs with an
average coverage less than this value. Default: 1
- -z <integer>
- Minimum number of reads in contig When loading, discard all contigs with a
number of reads less than this value. Default: 0 (=switched off)
- -l <integer>
- when output as text or HTML: number of bases shown in one alignment line.
Default: 60.
- -c <character>
- when output as text or HTML: character used to pad endgaps. Default: ' '
(blank)
- miraconvert source.maf dest.sam
- miraconvert source.caf dest.fasta wig ace
- miraconvert -x 2000 -y 10 source.caf dest.caf
- miraconvert -x 40 -C -F -F source.maf .fastq
mira(1), mirabait(1)
A more extensive documentation is provided in the MIRA manual
available online at
- http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html
On Debian, this can be installed with the mira-doc package and can
then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html.
On other systems, you may want to check in /usr/local/share/mira/doc or run
"locate DefinitiveGuideToMIRA" to find it locally.
You can also subscribe one of the MIRA mailing lists at
- http://www.chevreux.org/mira_mailinglists.html
After subscribing, mail general questions to the MIRA talk mailing
list:
- mira_talk@freelists.org
To report bugs or ask for features, please use the ticketing
system at:
- http://sourceforge.net/projects/mira-assembler/
Bastien Chevreux <bach@chevreux.org>
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.