pfmake - generate a profile from a multiple sequence alignment
pfmake [ -0123abes ] [ msf-file | - ] score-matrix [ profile-file ]
[E=#] [F=#] [G=#] [H=#] [I=#] [L=#] [M=#] [S=#] [T=#] [X=#]
pfmake generates a PROSITE profile from a multiple sequence
alignment using methods described by Gribskov et al. (1990), Luethy et al.
(1994), and Thompson et al. (1994), with modifications to exploit the
features of the new profile format. The file containing the multiple
sequence alignment (msf-file) must be in MSF format as generated by GCG
programs or by readseq (checksums are ignored). The score-matrix file must
also be in GCG format. If `-' is specified instead of a real filename, the
multiple sequence alignment is read from the standard input.
If an already existing profile is given as input via the third
optional argument, the parameters of the DISJOINT, NORMALIZATION, AND
CUT_OFF blocks will be read from input, all other profile parameters will be
recalculated. Header and footer lines outside the matrix block will also be
transferred from input to output.
If no input profile is given, the disjointness definition will be
set to PROTECT with borders leaving short unprotected tails (maximum 5
positions) at the beginning and at the end of the profile. Furthermore, one
normalization mode (n-score = raw-score / F , where F is the
output score multiplier , see below), and two cut-off values (level
0:8.5, level -1:6.5) will be defined.
- -0
- Global alignment mode; initiation (termination) at low cost is possible
only if the alignment starts at the beginning (end) of the profile and at
the beginning (end) of the sequence.
- -1
- Domain global alignment mode; initiation (termination) at low cost is
possible only at the beginning (end) of the profile; it may start and end
at any position within the sequence.
- -2
- Semi-global alignment mode; initiation (termination) at low cost is
possible if the alignment starts either at the beginning (end) of the
profile or at the beginning (end) of the sequences. This is the default
alignment mode.
- -3
- Local alignment mode; initiation (termination) at low cost is possible
anywhere. The high-cost initiation/termination score (parameter H)
is meaningless.
- -a
- Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al.
(1990).
- -b
- Block profile mode. By imposing additional constraints on the placement of
insertions and deletions, this mode produces profiles that favor
alignments with insertions and deletions positioned symmetrically around a
few positions. For each gap region a gap center is defined which usually
corresponds to the place where gap excision has been applied (see
parameter X). If no gap excision has been applied, the position is
chosen such as to maximize the sum of deletion opening events before, and
deletion closing events after the gap center. Within a given gap region
reduced deletion opening penalties are offered only before, reduced
deletion closing penalties only after, and reduced insertion penalties
only at the center. This option is incompatible with options -a and
-e and automatically disables them.
- -c
- Circular profile. The topology of the profile is declared as circular. The
first and the last insert positions are merged by retaining the higher
value of each parameter type.
- -e
- Enables endgap-weighting mode as implemented in the GCG program
ProfileMake. Endgaps in the multiple sequence alignment will be
interpreted as deletions relative to the other sequences and thus be
considered for the delineation of gap regions. The default is no endgap
weighting as introduced by Thompson et al. (1994) in the program
ProfileWeight.
- -s
- Causes pfsearch to weight gaps symmetrically (default mode). The initial
gap opening scores (MD, MI) computed from the maximal gap length
and the command-line parameters E,G,I, and M, will be
divided by two and the resulting value will be assigned to both gap
opening and gap closing scores (MI, IM, MD, DM).
- E=#
- Gap extension penalty, see Gribskov et al. (1990). Default: E=0.2
(appropriate for 1/3 bit-scaled blosum45 matrix).
- F=#
- Output score multiplier. On output, all profile scores are multiplied by
this factor and rounded to nearest integers. Default: F=100.
- G=#
- Gap opening penalty, see Gribskov et al. (1990). Default: G=2.1
(appropriate for 1/3 bit-scaled blosum45 matrix).
- H=#
- High-cost initiation/termination score. This score will be applied to all
external and internal initiation and termination scores corresponding to
path matrix positions where initiation or termination at low cost is not
possible according to the alignment mode specified. Default: H=*
(low-value).
- I=#
- Gap penalty multiplier increment, see Gribskov et al. (1990).
Default: I=0.1.
- L=#
- Low-cost initiation/termination score. This score will be applied to all
external and internal initiation and termination scores corresponding to
path matrix positions where initiation or termination at low cost is
possible according to the alignment mode specified. Default: L=0.
- M=#
- Maximum gap penalty multiplier, see Gribskov et al. (1990).
Default: M=0.333.
- S=#
- Score matrix multiplier. On input, the numbers of the score matrix are
multiplied by this factor. Default: S=0.1.
- T=#
- Gap region threshold. This is the minimal fraction of gap characters a
column of the multiple sequence alignment must contain in order to be
considered part of a gap region. Default: T=0.01.
- X=#
- Gap excision threshold. This is the minimal fraction of non-gap characters
a column of the multiple sequence alignment must contain in order to be
converted into a match position. The IM and MI transition
scores of insert positions corresponding to excised columns are set to
zero; the other parameters remain unchanged. Default: X=0.5.
- (1)
- pfmake -b1 sh3.msf blosum45.cmp H=0.6 > sh3_block.prf
Generates a domain-global block profile from a multiple
alignment of SH3 domains using the blosum45 matrix. sh3.msf contains a
multiple alignment of 20 SH3 domains from SWISS-PROT release 32
including sequence weights. blosum45.cmp contains a 1/3 bits-scaled
blosum45 matrix in GCG format. Note that fragment matches (alignments to
parts of the profile) are not prohibited but penalized by the parameter
H=0.6.
Bucher P, Karplus K, Moeri N & Hofmann, K. (1996). A
flexible motif search technique based on generalized
profiles. Comput. Chem. 20:3-24.
Gribskov M, Luethy R & Eisenberg D (1990). Profile
analysis. Meth. Enzymol. 183:146-159.
Luethy R, Xenarios I & Bucher P (1994). Improving the
sensitivity of the sequence profile method. Prot. Sci.
3:139-146.
Thompson JD, Higgins DG & Gibson TJ (1994) Improved
sensitivity of profile searches through the use of sequence
weights and gap excision. Comput. Appl. Biosci.
10:19-29.
Philipp Bucher
Philipp.Bucher@isrec.unil.ch