DOKK / manpages / debian 12 / biobambam2 / bamintervalcomment.1.en
BAMINTERVALCOMMENT(1) General Commands Manual BAMINTERVALCOMMENT(1)

bamintervalcomment - sort BAM files by coordinate or query name

bamintervalcomment [options]

bamintervalcomment reads a BAM, SAM or CRAM file and a file containing a list of named intervals, marks each line in the input with the list of all matching intervals and stores the resulting file in BAM, SAM or CRAM format. The intervals file needs to be given using the intervals key. The file can be either plain or compressed using gzip. The intervals file is expected to contain one interval per line. Each line is assumed to contain a tab separated list of values, where the following columns are used by the program:

contain a pair of names which form the id of the interval
gives the name of the reference sequence containing the interval
give the interval on the reference sequence designated by the third column as a pair of non negative integers. Both borders are included.

For each alignment the matching interval designators are stored in the CO (comment) auxiliary field in the form of a semicolon separated list, where each list element is a pair (A,B) given the two id columns of the respective interval. An example of an interval file can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz .

The following key=value pairs can be given when running the program:

level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

-1:
zlib/gzip default compression level
0:
uncompressed
1:
zlib/gzip level 1 (fast) compression
9:
zlib/gzip level 9 (best) compression

If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value is

11:
igzip compression

verbose=<1>: Valid values are

1:
print progress report on standard error
0:
do not print progress report

tmpfile=<filename>: set the prefix for temporary file names

disablevalidation=<0|1>: sets whether input validation is performed. Valid values are

0:
validation is enabled (default)
1:
validation is disabled

md5=<0|1>: md5 checksum creation for output file. This option can only be given if outputformat=bam. Then valid values are

0:
do not compute checksum. This is the default.
1:
compute checksum. If the md5filename key is set, then the checksum is written to the given file. If md5filename is unset, then no checksum will be computed.

md5filename file name for md5 checksum if md5=1.

index=<0|1>: compute BAM index for output file. This option can only be given if outputformat=bam. Then valid values are

0:
do not compute BAM index. This is the default.
1:
compute BAM index. If the indexfilename key is set, then the BAM index is written to the given file. If indexfilename is unset, then no BAM index will be computed.

indexfilename file name for output BAM index if index=1.

inputformat=<bam>: input file format. All versions of bamintervalcomment come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:

BAM (see http://samtools.sourceforge.net/SAM1.pdf)
SAM (see http://samtools.sourceforge.net/SAM1.pdf)
CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

outputformat=<bam>: output file format. All versions of bamintervalcomment come with support for the BAM output format. If the program in addition is linked to the io_lib package, then the following options are valid:

BAM (see http://samtools.sourceforge.net/SAM1.pdf)
SAM (see http://samtools.sourceforge.net/SAM1.pdf)
CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit). This format is not advisable for data sorted by query name.

I=<[stdin]>: input filename, standard input if unset.

O=<[stdout]>: output filename, standard output if unset.

inputthreads=<[1]>: input helper threads, only valid for inputformat=bam.

outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.

reference=<[]>: reference FastA file for inputformat=cram and outputformat=cram. An index file (.fai) is required.

range=<>: input range to be processed. This option is only valid if the input is a coordinate sorted and indexed BAM file

intervals=<>: file name of intervals file

Written by German Tischler.

Report bugs to <germant@miltenyibiotec.de>

Copyright © 2009-2014 German Tischler, © 2011-2014 Genome Research Limited. License GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

June 2014 BIOBAMBAM