pfscan - scan a protein or DNA sequence with a profile library
pfscan [ -abflLrsuxy ] [ seq-file | - ]
[ profile-library-file | - ] [L=#] [W=#]
pfscan compares a protein or nucleic acid sequence against
a profile library. The result is an unsorted list of profile-sequence
matches written to the standard output. A variety of output formats
containing different information can be specified via the options -a, -l,
-L, -r, -u, -s, -x, -y and -z. seq-file contains a
sequence in EMBL/SWISS-PROT format (assumed by default) or in Pearson/Fasta
format (indicated by option -f). profile-library-file contains
a library of profiles in PROSITE format. pfscan can be used as a
filter if - is used instead of one of the input filenames.
- -a
- Report optimal alignment scores for all profiles regardless of the cut-off
value. This option simultaneously forces DISJOINT=UNIQUE.
- -b
- Search the complementary strand of the DNA sequence as well.
- -f
- Input sequence is in Pearson/Fasta format.
- -l
- Indicate highest cut-off level exceeded by the match score in the output
list.
- -L
- Indicate by character string the highest cut-off level exceeded by the
match score in the output list. Note that the generalized profile format
includes a text string field to specify a name for a cut-off level. The -L
option causes the program to display the first two characters of this text
string (usually something like "!" "?",
"??", etc.) at the beginning of each match description.
- -r
- Use raw scores rather than normalized scores for match selection.
Normalized scores will not be listed in the output.
- -s
- List the sequences of the matched regions as well. The output will be a
Pearson/Fasta-formatted sequence library.
- -u
- Forces DISJOINT=UNIQUE.
- -x
- List profile-sequence alignments in pftools PSA format.
- -y
- Display alignments between the profile and the matched sequence regions in
a human-friendly format.
- -z
- Indicate starting and ending position of the matched profile range. The
latter position will be given as a negative offset from the end of the
profile. Thus the range [ 1, -1] means entire profile.
- L=#
- Cut-off level to be used for match selection. If level L is not
specified in the profile, the next higher (if L is negative) or
next lower (if L is positive) level specified is used instead.
- W=#
- Output width. Output lines will be truncated after W characters.
Default: W=132.
- (1)
- pfscan -s GTPA_HUMAN prosite13.prf
Scans the human GAP protein for matches to profiles in PROSITE
release 13. GTPA_HUMAN contains the SWISS-PROT entry P20936|GTPA_HUMAN.
prosite13.prf contains all profile entries of PROSITE release 13. The
output is a Pearson/Fasta-formatted sequence library containing all
sequence regions of the input sequence matching a profile in the profile
library.
- (2)
- pfscan -by CVPBR322 ecp.prf L=2
Scans both strands of plasmid PBR322 for high-scoring (level
2) E. coli promoter matches. CVPBR322 contains EMBL entry
J01749|CVPBR322. ecp.prf contains a profile for E. coli
promoters. The output includes profile-sequence alignments in a
human-friendly format.
Philipp Bucher
Philipp.Bucher@isrec.unil.ch