BAMDOWNSAMPLERANDOM(1) | General Commands Manual | BAMDOWNSAMPLERANDOM(1) |
bamdownsamplerandom - downsample a SAM, BAM or CRAM file
bamdownsamplerandom [options]
bamdownsamplerandom reads a SAM, BAM or CRAM file from standard input, randomly discards reads and writes the remaining reads to standard output in BAM format. For a pair of reads either both ends are discarded or both ends are kept. The order of reads in the output file may be different from the order in the input if the reads in the input file are not collated by their read name.
The following key=value pairs can be given:
p=<1>: probability for a pair of reads or a single end read to be kept. By default all reads are kept.
seed=<>: seed used for the random number generator. By default the current time is used, i.e. each run of the program will select a different subset of reads from an input file. If the behaviour of the program needs to be reproducible a fixed number can be used as the random seed.
I=<stdin>: input file name (data is read from standard input if this option is not given)
inputformat=<bam>: input file format All versions of bamtofastq come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:
level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are
If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value is
exclude=<SECONDARY,SUPPLEMENTARY>: Do not include reads in the output that have any of the given flags set. The flags are given separated by commas. Valid flags are:
disablevalidation=<0>: Valid values are
colhlog=<18> base two logarithm of the size of the hash table used for collation (the default value is 18 and should work reasonably well for most input files. Please see the biobambam paper at arxiv.org/abs/1306.0836 for details).
colsbs=<128M> size of hash table overflow list in bytes (the default is 128MB and should work reasonably well for most input files. Please see the biobambam paper at arxiv.org/abs/1306.0836 for details).
T=<bamdownsamplerandom_hostname_pid_time> file name of temporary file used for collation
ranges=<>: coordinate ranges selected from input. This option is only available for input files in BAM format which have a corresponding index (.bai file) and if input is via file (i.e. the I argument is set). Valid ranges consist either of
Multiple ranges are separated by space characters (e.g. ranges="chr1:10000-20000 chr1:30000-40000").
reference=: file name of the reference for CRAM input files. If this key is unset, then the CRAM file header will be scanned for obtaining a reference file name.
tmpfile=<filename>: prefix for temporary files. By default the temporary files are created in the current directory
outputformat=<bam>: output file format. All versions of bamsort come with support for the BAM output format. If the program in addition is linked to the io_lib package, then the following options are valid:
O=<[stdout]>: output filename, standard output if unset.
outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.
md5=<0|1>: md5 checksum creation for output file. This option can only be given if outputformat=bam. Then valid values are
md5filename file name for md5 checksum if md5=1.
index=<0|1>: compute BAM index for output file. This option can only be given if outputformat=bam. Then valid values are
indexfilename file name for output BAM index if index=1.
hash=<0|1>: use hash of query name instead of a random number for selection. This makes the output depend on how random the hashes produced for the query names are, but it has the advantage of not requiring collation to keep pairs together. In contast the order of retained reads does not change for hash=1.
Written by German Tischler.
Report bugs to <germant@miltenyibiotec.de>
Copyright © 2009-2014 German Tischler, © 2011-2014
Genome Research Limited. License GPLv3+: GNU GPL version 3
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.
October 2014 | BIOBAMBAM |