PSA(5) | File formats | PSA(5) |
psa - biological sequence alignment file format
psa is an output format used by the pftools package to describe alignments between biological sequences (DNA or protein) and PROSITE profiles.
psa is apparented to the widely used biological sequence file format fasta. Nevertheless it does not only describe a biological sequence, it is especially used to include information of alignments between a motif descriptor like a PROSITE profile and a given sequence. This information is included in the header and reflected in the structure of the sequence following the header line.
Each sequence in a psa alignment file or output must be
preceded by a fasta header line.
The general syntax of such a fasta header line is as follows:
The header must start with a '>' character which is
directly followed by the seq_id field. This field is interpreted by
most programs as the sequence's identifier and/or accession
number. It ends at the first encountered whitespace character.
The pftools programs will use the free_text to add information
about the match score, position and description of the sequence or motif.
Please refer to the man page of the corresponding programs for further
information about the output formats.
The header can only extend over one line. The following lines up to a new line
starting with a '>' character or the end of the file are
interpreted as sequence data.
The line following the header, starts the alignment data between a
sequence and a PROSITE profile. This data can span over several lines
of different length.
The data is formed by upper or lower-case characters of the
corresponding sequence alphabet (DNA or protein). The gap characters
'.' and '-' are also supported.
The alignment always has at least the length of the matching profile.
Insertions or deletions detected during the motif/sequence alignment step
will vary the length of the data reported, and can be identified using the
following conventions:
This is an example of the output produced by
pfsearch(1) using the '-x' (i.e. psa output) option. The
first line starting with the '>' character is the fasta
header. It also contains information about the raw score of the
alignment as well as its position in the input sequence.
On the next line you find the alignment proper. Starting at position 6, we
can find an insertion of the 'lns' residues in the
sequence compared to the motif. The last two positions of the motif are
not present in the sequence (i.e. they are deleted). This is
indicated by the presence of two '-' (dash) characters at the end
of the alignment.
xpsa(5), pfsearch(1), pfscan(1), pfw(1), pfmake(1), psa2msa(1)
This manual page was originally written by Volker Flegel.
The pftools package was developed by Philipp Bucher.
Any comments or suggestions should be addressed to
<pftools@sib.swiss>.
April 2003 | pftools 2.3 |