maskSeqEnds - Masks ragged leading and trailing edges of aligned DNA sequences
Description¶
maskSeqEnds
takes a vector of DNA sequences, as character strings,
and replaces the leading and trailing characters with "N"
characters to create
a sequence vector with uniformly masked outer sequence segments.
Usage¶
maskSeqEnds(seq, mask_char = "N", max_mask = NULL, trim = FALSE)
Arguments¶
- seq
- character vector of DNA sequence strings.
- mask_char
- character to use for masking.
- max_mask
- the maximum number of characters to mask. If set to 0 then
no masking will be performed. If set to
NULL
then the upper masking bound will be automatically determined from the maximum number of observed leading or trailing"N"
characters amongst all strings inseq
. - trim
- if
TRUE
leading and trailing characters will be cut rather than masked with"N"
characters.
Value¶
A modified seq
vector with masked (or optionally trimmed) sequences.
Examples¶
# Default behavior uniformly masks ragged ends
seq <- c("CCCCTGGG", "NAACTGGN", "NNNCTGNN")
maskSeqEnds(seq)
[1] "NNNCTGNN" "NNNCTGNN" "NNNCTGNN"
# Does nothing
maskSeqEnds(seq, max_mask=0)
[1] "CCCCTGGG" "NAACTGGN" "NNNCTGNN"
# Cut ragged sequence ends
maskSeqEnds(seq, trim=TRUE)
[1] "CTG" "CTG" "CTG"
# Set max_mask to limit extent of masking and trimming
maskSeqEnds(seq, max_mask=1)
[1] "NCCCTGGN" "NAACTGGN" "NNNCTGNN"
maskSeqEnds(seq, max_mask=1, trim=TRUE)
[1] "CCCTGG" "AACTGG" "NNCTGN"
# Mask dashes instead of Ns
seq <- c("CCCCTGGG", "-AACTGG-", "---CTG--")
maskSeqEnds(seq, mask_char="-")
[1] "---CTG--" "---CTG--" "---CTG--"
See also¶
See maskSeqGaps for masking internal gaps. See padSeqEnds for padding sequence of unequal length.