bamseqchksum - produce checksums for primary data in BAM files
bamseqchksum reads a BAM file from stdin, for each record
calculates hash digest checksums over
- [1]
- flags and sequence
- [2]
- queryname, flags and sequence
- [3]
- flags, sequence and qualities
- [4]
- flags, sequence and source data related aux tags
where the flags are the least significant byte of the BAM FLAGS
containing only the bits for multiple segments, first segment and last
segment. The sequence is reverse complemented, and quality string reversed,
before checksumming if the reverse complemented bit is set.
Depending on the chosen hash digest function either the sum modulo
some power of 2 or the product modulo a prime number of these checksums is
taken over all non-supplementary and non-secondary BAM alignment records.
Separate sums or products are reported for combinations of all and QC pass
records and for each readgroup.
The following key=value pairs can be given:
verbose=<0>: Valid values are
- 1:
- print progress report on standard error
- 0:
- do not print progress report
inputformat=<bam>: input file format All versions of
bamseqchksum come with support for the BAM input format. If the program in
addition is linked to the io_lib package, then the following options are
valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
reference=: file name of the reference for CRAM input
files. If this key is unset, then the CRAM file header will be scanned for
obtaining a reference file name.
hash=<crc32prod>: hash digest used to compute
checksums. All versions of biobambam support the following functions:
- crc32prod:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^31-1. This is the default and
only option for biobambam versions up to 0.0.174.
- crc32:
- checksums are computed via crc32 and combined by summing up modulo
2^32.
- md5:
- checksums are computed via md5 and combined by summing up modulo
2^128.
- crc32prime32:
- identical with crc32prod (alternate implementation for testing
purposes)
- crc32prime64:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- md5prime64:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- crc32prime96:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- md5prime96:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- crc32prime128:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- md5prime128:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- crc32prime160:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- md5prime160:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- crc32prime192:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- md5prime192:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- crc32prime224:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- md5prime224:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- crc32prime256:
- checksums are computed via crc32 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- md5prime256:
- checksums are computed via md5 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- null:
- no checksums are computed and all checksums in the programs output are 0.
This option is for performance testing only.
If libmaus is compiled with support for the nettle library, then
the following options are available:
- sha1:
- checksums are computed via sha1 and combined by summing up modulo
2^160.
- sha1prime64:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- sha1prime96:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- sha1prime128:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- sha1prime160:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- sha1prime192:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- sha1prime224:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- sha1prime256:
- checksums are computed via sha1 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- sha224:
- checksums are computed via sha2-224 and combined by summing up modulo
2^224.
- sha224prime64:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- sha224prime96:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- sha224prime128:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- sha224prime160:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- sha224prime192:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- sha224prime224:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- sha224prime256:
- checksums are computed via sha2-224 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- sha256:
- checksums are computed via sha2-256 and combined by summing up modulo
2^256.
- sha256prime64:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- sha256prime96:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- sha256prime128:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- sha256prime160:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- sha256prime192:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- sha256prime224:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- sha256prime256:
- checksums are computed via sha2-256 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- sha384:
- checksums are computed via sha2-384 and combined by summing up modulo
2^384.
- sha384prime64:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- sha384prime96:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- sha384prime128:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- sha384prime160:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- sha384prime192:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- sha384prime224:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- sha384prime256:
- checksums are computed via sha2-384 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- sha512:
- checksums are computed via sha2-512 and combined by summing up modulo
2^512.
- sha512prime64:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^64-59.
- sha512prime96:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^96-17.
- sha512prime128:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^128-159.
- sha512prime160:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^160-47.
- sha512prime192:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^192-237.
- sha512prime224:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^224-63.
- sha512prime256:
- checksums are computed via sha2-512 and combined over multiple records by
multiplication modulo the prime number 2^256-189.
- sha512primesums:
- checksums are computed via sha2-512 and combined over multiple records by
adding modulo the Mersenne prime number 2^521-1.
- sha512primesums512:
- checksums are computed via sha2-512 and combined over multiple records by
adding modulo 2^512-75.
- murmur3:
- checksums are computed via MurmurHash3_x64_128 (see
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and
combined over multiple records by summing modulo 2^128.
- murmur3primesums128:
- checksums are computed via MurmurHash3_x64_128 (see
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and
combined over multiple records by summing modulo 2^128+51.
Written by David Jackson (using code by German Tischler as a
template). Extended to hash digests beyond crc32prod by German Tischler.
Report bugs to <germant@miltenyibiotec.de>
Copyright © 2014-2014 David Jackson, © 2014-2014
Genome Research Limited. Copyright © 2009-2016 German Tischler,
© 2011-2014 Genome Research Limited. License GPLv3+: GNU GPL version
3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.