padSeqEnds - Pads ragged ends of aligned DNA sequences
Description¶
padSeqEnds
takes a vector of DNA sequences, as character strings,
and appends the ends of each sequence with an appropriate number of "N"
characters to create a sequence vector with uniform lengths.
Usage¶
padSeqEnds(seq, len = NULL, start = FALSE, pad_char = "N", mod3 = TRUE)
Arguments¶
- seq
- character vector of DNA sequence strings.
- len
- length to pad to. Only applies if longer than the maximum length of
the data in
seq
. - start
- if
TRUE
pad the beginning of each sequence instead of the end. - pad_char
- character to use for padding.
- mod3
- if
TRUE
pad sequences to be of length multiple three.
Value¶
A modified seq
vector with padded sequences.
Examples¶
# Default behavior uniformly pads ragged ends
seq <- c("CCCCTGGG", "ACCCTG", "CCCC")
padSeqEnds(seq)
[1] "CCCCTGGGN" "ACCCTGNNN" "CCCCNNNNN"
# Pad to fixed length
padSeqEnds(seq, len=15)
[1] "CCCCTGGGNNNNNNN" "ACCCTGNNNNNNNNN" "CCCCNNNNNNNNNNN"
# Add padding to the beginning of the sequences instead of the ends
padSeqEnds(seq, start=TRUE)
[1] "NCCCCTGGG" "NNNACCCTG" "NNNNNCCCC"
padSeqEnds(seq, len=15, start=TRUE)
[1] "NNNNNNNCCCCTGGG" "NNNNNNNNNACCCTG" "NNNNNNNNNNNCCCC"
See also¶
See maskSeqEnds for creating uniform masking from existing masking.