FASTQTOBAM(1) | General Commands Manual | FASTQTOBAM(1) |
fastqtobam - convert FastQ to unmapped BAM
fastqtobam [options]
fastqtobam reads one or two FastQ files and converts them to a BAM file in which each read is marked as unmapped. If no input file name is given, then a single FastQ file is read from standard input. If one file name is given, then a single FastQ file is read from the given file. In both cases the read names in the file are parsed to determine whether the contained reads are paired or not if the name scheme is not set to pairedfiles. If two file names are given, then the program assumes to find two FastQ files which are synchronous, i.e. where the first read in the first file is the mate of the first read in the second file etc. Input file names can be given either via the I key or after the key=value pairs on the command line. The program accepts read name formats as described below under the key namescheme.
The following key=value pairs can be given:
verbose=<[0|1]> print progress report. By default progress is not reported.
I=<filename>: input file name (data is read from standard input if this option is not given). This key can be given twice.
level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are
If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value is
md5=<0|1>: md5 checksum creation for output file. Valid values are
md5filename file name for md5 checksum if md5=1.
gz=<[0|1]> input is gzip compressed FastQ. By default input is assumed to be uncompressed FastQ.
threads=<1> additional BAM encoding helper threads.
PGID=<> read group identifier for reads. By default no read group identifier is set. The fields CN, DS, DT, FO, KS, LB, PG, PI, PL, PU and SM of the corresponding @RG header line can be set by using the keys RGCN, RGDS, etc. respectively.
qualityoffset=<33> FastQ quality offset. This value is subtracted from the ASCII character representation to get the quality score value.
qualitymax=<41> maximum valid quality value, 41 by default. Higher values may indicate a wrong setting of the qualityoffset parameter. BAM allows quality values up to the value of 94.
qualityhist=<0> compute a quality histogram and print it on the standard error channel after processing has finished successfully. Lines for the quality histogram are prefixed with [H] and contain tab separated values. The histogram enumerates quality scores from high to low values. The histogram has four columns (after the [H] marker). The first is the ASCII representation of the quality with offset 33, i.e. the symbol ! denotes quality 0. The second column gives the absolute frequency of the value. The third column stores the relative frequency of the value, i.e. the fraction of all values assigned to this value. The fourth column gives a cumulative relative frequency value over all quality for the current line and those for higher quality values.
checkquality=<1> check whether quality values are in range and terminate if an invalid value is encountered.
namescheme=<generic> read name scheme. This determines how read names are parsed. There are four possible options:
chksumfn=<> File name used for storing bamseqchksum like information about the output file. By default no such file is produced.
hash=<crc32prod> Hash used for producing bamseqchksum type information. The information produced is only stored if the chksumfn option is set.
Written by German Tischler.
Report bugs to <germant@miltenyibiotec.de>
Copyright © 2009-2014 German Tischler, © 2011-2014
Genome Research Limited. License GPLv3+: GNU GPL version 3
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.
July 2013 | BIOBAMBAM |