TFBS::Matrix::PWM(3pm) | User Contributed Perl Documentation | TFBS::Matrix::PWM(3pm) |
TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns
my $matrixref = [ [ 0.61, -3.16, 1.83, -3.16, 1.21, -0.06], [-0.15, -2.57, -3.16, -3.16, -2.57, -1.83], [-1.57, 1.85, -2.57, -1.34, -1.57, 1.14], [ 0.31, -3.16, -2.57, 1.76, 0.24, -0.83] ]; my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <<ENDMATRIX 0.61 -3.16 1.83 -3.16 1.21 -0.06 -0.15 -2.57 -3.16 -3.16 -2.57 -1.83 -1.57 1.85 -2.57 -1.34 -1.57 1.14 0.31 -3.16 -2.57 1.76 0.24 -0.83 ENDMATRIX ; my $pwm = TFBS::Matrix::PWM->new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" );
(See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.)
my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM"); # or my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM");
(see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices)
my @pwm_list = $matrixset->all_patterns(-sort_by=>"name");
my $siteset = $pwm->search_seq(-file =>"myseq.fa", -threshold => "80%");
my $site_pair_set = $pwm->search_aln(-file =>"myalign.aln", -threshold => "80%", -cutoff => "70%", -window => 50);
TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following position frequency matrix:
A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ]
The standard computational procedure is applied to convert it into the following position weight matrix:
A:[ 0.61 -3.16 1.83 -3.16 1.21 -0.06] C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83] G:[-1.57 1.85 -2.57 -1.34 -1.57 1.14] T:[ 0.31 -3.16 -2.57 1.76 0.24 -0.83]
which contains the "weights" associated with the occurrence of each nucleotide at the given position in a pattern.
A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search).
Please send bug reports and other comments to the author.
Boris Lenhard <Boris.Lenhard@cgb.ki.se>
The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore.
Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL
Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -subpart # subpart of the sequence to search, given as # -subpart => { start => 140, # end => 180 } # where start and end are coordinates in the # sequence; the coordinate range is interpreted # in the BioPerl tradition (1-based, inclusive) # OPTIONAL: by default searches entire alignment
Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient)
The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.
2020-11-09 | perl v5.32.0 |