countPatterns - Count sequence patterns
Description¶
countPatterns
counts the fraction of times a set of character patterns occur
in a set of sequences.
Usage¶
countPatterns(seq, patterns, nt = TRUE, trim = FALSE, label = "region")
Arguments¶
- seq
- character vector of either DNA or amino acid sequences.
- patterns
- list of sequence patterns to count in each sequence. If the list is named, then names will be assigned as the column names of output data.frame.
- nt
- if
TRUE
thenseq
are DNA sequences and and will be translated before performing the pattern search. - trim
- if
TRUE
remove the first and last codon or amino acid from each sequence before the pattern search. IfFALSE
do not modify the input sequences. - label
- string defining a label to add as a prefix to the output column names.
Value¶
A data.frame containing the fraction of times each sequence pattern was found.
Examples¶
seq <- c("TGTCAACAGGCTAACAGTTTCCGGACGTTC",
"TGTCAGCAATATTATATTGCTCCCTTCACTTTC",
"TGTCAAAAGTATAACAGTGCCCCCTGGACGTTC")
patterns <- c("A", "V", "[LI]")
names(patterns) <- c("arg", "val", "iso_leu")
countPatterns(seq, patterns, nt=TRUE, trim=TRUE, label="cdr3")
cdr3_arg cdr3_val cdr3_iso_leu
1 0.1250000 0 0.0000000
2 0.1111111 0 0.1111111
3 0.1111111 0 0.0000000