BuildConsensus.py - Builds a consensus sequence for each set of
input sequences
usage: BuildConsensus.py [--version] [-h] -s SEQ_FILES
[SEQ_FILES ...]
- [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
- [--outname OUT_NAME] [--log LOG_FILE] [--failed] [--fasta] [--delim
DELIMITER DELIMITER DELIMITER] [--nproc NPROC] [-n MIN_COUNT] [--bf
BARCODE_FIELD] [-q MIN_QUAL] [--freq MIN_FREQ] [--maxgap MAX_GAP] [--pf
PRIMER_FIELD] [--prcons PRIMER_FREQ] [--cf COPY_FIELDS [COPY_FIELDS ...]]
[--act {min,max,sum,set,majority} [{min,max,sum,set,majority} ...]]
[--dep] [--maxdiv MAX_DIVERSITY | --maxerror MAX_ERROR]
Builds a consensus sequence for each set of input sequences
- -s SEQ_FILES [SEQ_FILES
...]
- A list of FASTA/FASTQ files containing sequences to process. (default:
None)
- -o OUT_FILES [OUT_FILES
...]
- Explicit output file name(s). Note, this argument cannot be used with the
--failed, --outdir, or --outname arguments. If
unspecified, then the output filename will be based on the input
filename(s). (default: None)
- --outdir
OUT_DIR
- Specify to changes the output directory to the location specified. The
input file directory is used if this is not specified. (default:
None)
- --outname
OUT_NAME
- Changes the prefix of the successfully processed output file to the string
specified. May not be specified with multiple input files. (default:
None)
- --log LOG_FILE
- Specify to write verbose logging to a file. May not be specified with
multiple input files. (default: None)
- --failed
- If specified create files containing records that fail processing.
(default: False)
- --fasta
- Specify to force output as FASTA rather than FASTQ. (default: None)
- --delim DELIMITER
DELIMITER DELIMITER
- A list of the three delimiters that separate annotation blocks, field
names and values, and values within a field, respectively. (default: ('|',
'=', ','))
- --nproc
NPROC
- The number of simultaneous computational processes to execute (CPU cores
to utilized). (default: 4)
- -n MIN_COUNT
- The minimum number of sequences needed to define a valid consensus.
(default: 1)
- --bf BARCODE_FIELD
- Position of description barcode field to group sequences by. (default:
BARCODE)
- -q MIN_QUAL
- Consensus quality score cut-off under which an ambiguous character is
assigned; does not apply when quality scores are unavailable. (default:
0)
- --freq
MIN_FREQ
- Fraction of character occurrences under which an ambiguous character is
assigned. (default: 0.6)
- --maxgap
MAX_GAP
- If specified, this defines a cut-off for the frequency of allowed gap
values for each position. Positions exceeding the threshold are deleted
from the consensus. If not defined, positions are always retained.
(default: None)
- --pf PRIMER_FIELD
- Specifies the field name of the primer annotations (default: None)
- --prcons
PRIMER_FREQ
- Specify to define a minimum primer frequency required to assign a
consensus primer, and filter out sequences with minority primers from the
consensus building step. (default: None)
- --cf COPY_FIELDS
[COPY_FIELDS ...]
- Specifies a set of additional annotation fields to copy into the consensus
sequence annotations. (default: None)
- --act
{min,max,sum,set,majority} [{min,max,sum,set,majority} ...]
- List of actions to take for each copy field which defines how each
annotation will be combined into a single value. The actions
"min", "max", "sum" perform the
corresponding mathematical operation on numeric annotations. The action
"set" combines annotations into a comma delimited list of unique
values and adds an annotation named <FIELD>_COUNT specifying the
count of each item in the set. The action "majority" assigns the
most frequent annotation to the consensus annotation and adds an
annotation named <FIELD>_FREQ specifying the frequency of the
majority value. (default: None)
- --dep
- Specify to calculate consensus quality with a nonindependence assumption
(default: False)
- --maxdiv
MAX_DIVERSITY
- Specify to calculate the nucleotide diversity of each read group (average
pairwise error rate) and remove groups exceeding the given diversity
threshold. Diversity is calculate for all positions within the read group,
ignoring any character filtering imposed by the -q, --freq
and --maxgap arguments. Mutually exclusive with --maxerror.
(default: None)
- --maxerror
MAX_ERROR
- Specify to calculate the error rate of each read group (rate of mismatches
from consensus) and remove groups exceeding the given error threshold. The
error rate is calculated against the final consensus sequence, which may
include masked positions due to the -q and --freq arguments
and may have deleted positions due to the --maxgap argument.
Mutually exclusive with --maxdiv. (default: None)
- consensus-pass
- consensus reads.
- consensus-fail
- raw reads failing consensus filtering criteria.
- PRIMER
- a comma delimited list of unique primer annotations found within the
barcode read group.
- PRCOUNT
- a comma delimited list of the corresponding counts of unique primer
annotations.
- PRCONS
- the majority primer within the barcode read group.
- PRFREQ
- the frequency of the majority primer.
- CONSCOUNT
- the count of reads within the barcode read group which contributed to the
consensus sequence. This is the total size of the read group, minus
sequence excluded due to user defined filtering criteria.
This manpage was written by Andreas Tille for the Debian
distribution and
can be used for any other usage of the program.