PFSCALE(1) | General Commands Manual | PFSCALE(1) |
pfscale - fit parameters of an extreme-value distribution to a profile score list
pfscale [ score-list | - ] [ profile-file ] [L=#] [N=#] [P=#] [Q=#]
pfscale fits the two parameters of an extreme-value distribution to a score distribution obtained by searching a sequence database with a profile. score-list is a sorted list of profile match scores generated by pfsearch. The result is written to the standard output.
If the original profile is given as the second argument, the normalization function specified within the profile will be updated such as to produce -Log10 per-residue E-values. If the second argument is omitted, the output consists of a header line containing the normalization parameters followed by a modified score list, showing original scores, normalized scores, and corresponding log-cumulative frequencies next to each other.
Note that this program implements the significance estimation procedure for profile match scores described in (Hofmann & Bucher 1995). It has been used for the calculation of the normalization parameters of all profiles in PROSITE.
derives score-normalization parameters for the SH3 domain profile in sh3.prf. shuffle20.seq contains a window-shuffled derivative of SWISS-PROT release 30 in Pearson/Fasta format (window-size 20). Note that the implicit default of N corresponds to the size of this database and thus needs not to be specified on the command line. The cut-off value C=200 will produce about 2000 matches completely covering the range defined by the command line parameters of P and Q. A suitable cut-off value has to be guessed in advance by computing a few optimal alignment scores for random sequences.
Hofmann K & Bucher P (1995). The FHA-domain: a nuclear signalling domain found in protein kinases and transcription factors. Trends Biochem. Sci. 20:47-349.
Philipp Bucher
Philipp.Bucher@isrec.unil.ch
July 1999 | pftools 2.2 |