NAME

bamconsensus - compute rough consensus sequence from alignments

SYNOPSIS

bamconsensus [reference=ref.fasta] < in.bam >out.fasta [options]

DESCRIPTION

bamconsensus reads a BAM, SAM or CRAM file and computes a rough consensus based on the alignments contained. The input file needs to be sorted in coordinate order. The consensus is written as an alignment file on the standard output channel. The sequence names in the output file are structured as

contig_A_B_C_D_E

where

A is the numeric reference id (0 based)

B is the name of the reference sequence as given in the BAM header

C is a numerical contig id within the contigs for a given reference id

D is the start position on the reference sequence (inclusive)

E is the end position on the reference sequence (exclusive)

The reference key specifying the name of a FastA reference sequence file is required. The consensus is constructed by computing heavy paths in local DeBruijn graphs. Consequently it is usually a patchwork of the haplotypes present for diploid/polyploid genomes.

The following key=value pairs can be given:

reference=<ref.fasta>: reference FastA file (required)

verbose=<1>: Valid values are

1:: print progress report on standard error
0:: do not print progress report

T=<filename>: set the prefix for temporary file names

k=<32>: k-mer size used for consensus computation (maximum 32).

minlen=<50>: minimum length of alignments used (default 50).

inputformat=<bam>: input format

range=<>: input range to be processed. This option is only valid if the input is a coordinate sorted and indexed BAM file

AUTHOR

Written by German Tischler-Höhle.

REPORTING BUGS

Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

Copyright © 2019 German Tischler. License GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

August 2019

BIOBAMBAM