BAMCOLLATE2(1) | General Commands Manual | BAMCOLLATE2(1) |
bamcollate2 - collate reads in a SAM, BAM or CRAM file by name
bamcollate2 [options]
bamcollate2 reads a SAM, BAM or CRAM file from standard input, collates the contained reads/alignments by name and writes the resulting data to standard output in BAM format.
The following key=value pairs can be given:
collate=<0|1|2|3>: Valid values are
filename=<stdin>: input file name (data is read from standard input if this option is not given)
inputformat=<bam>: input file format All versions of bamcollate2 come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:
level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are
If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value is
exclude=<SECONDARY>: Do not include reads in the output that have any of the given flags set. The flags are given separated by commas. Valid flags are:
disablevalidation=<0>: Valid values are
colhlog=<18> base two logarithm of the size of the hash table used for collation (the default value is 18 and should work reasonably well for most input files. Please see the biobambam paper at arxiv.org/abs/1306.0836 for details).
colsbs=<128M> size of hash table overflow list in bytes (the default is 128MB and should work reasonably well for most input files. Please see the biobambam paper at arxiv.org/abs/1306.0836 for details).
T=<bamcollate2_hostname_pid_time> file name of temporary file used for collation
ranges=<>: coordinate ranges selected from input. This option is only available for input files in BAM and CRAM format which have a corresponding index file (.bai for BAM, .crai for CRAM) and if input is via file (i.e. the filename argument is set). Valid ranges consist of either
For BAM input multiple ranges are separated by space characters (e.g. ranges="chr1:10000-20000 chr1:30000-40000"). CRAM input supports a single range only.
reference=: file name of the reference for CRAM input files. If this key is unset, then the CRAM file header will be scanned for obtaining a reference file name.
md5=<0|1>: md5 checksum creation for output file. Valid values are
md5filename file name for md5 checksum if md5=1.
index=<0|1>: compute BAM index for output file. Valid values are
indexfilename file name for BAM index if index=1.
readgroups comma separated list of read group identifiers to be kept. If not given all records will be kept. Read group filtering is only available if collate=0 and collate=1 (i.e. this key is ignored for collate=2 and collate=3).
mapqthres mapping quality threshold. This option is only available for collate=1 (i.e. it is ignored for collate=0 and collate>1). If this key is set, reads are kept if the mapping quality field is at least the given value. For paired end reads it is sufficient for a read or its mate to have a mapping quality above the threshold.
reset reduce alignments to an unmapped state (see bamreset). This key is only valid for collate=0, collate=1 or collate=3. The default value is 0 for collate=0 and collate=1 and 1 for collate=3.
classes types of alignment lines to be kept. This key is only valid for collate=1. By default all alignments are kept. The value for this key is a comma separated list consisting of a subset of the following options:
resetheadertext file name for replacement SAM header. By default the header of the input SAM/BAM/CRAM file is used (and filtered in case of reset=1).
resetaux=<0|1>: remove auxiliary fields if resetaux=1. This key is only available for reset=1. If reset=1 then the default is to remove all aux fields.
auxfilter=<>: comma separated list of aux tags to be kept if reset=1 and resetaux=0. If the key is not set then all tags are kept.
outputformat=<bam>: output file format. All versions of bamcollate2 come with support for the BAM output format. If the program in addition is linked to the io_lib package, then the following options are valid:
O=<[stdout]>: output filename, standard output if unset.
outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.
verbose=<1>: Valid values are
replacereadgroupnames=<>: file name containing a list of read group mappings. Each line in the file corresponds to one read group ID replacement and contains two columns separated by the tab symbol (ASCII code 9). The first column contains the source identifier which will be replaced by the value of the second column in the output file. This option is only valid for collate<2. By default no read group identifier mapping is performed.
Written by German Tischler.
Report bugs to <germant@miltenyibiotec.de>
Copyright © 2009-2015 German Tischler, © 2011-2015
Genome Research Limited. License GPLv3+: GNU GPL version 3
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.
February 2015 | BIOBAMBAM |