This module is designed to be a repository of functions that are
repeatedly used during parsing and formatting of SWISS-PROT/TREMBL lines. If
more than two line types need to do aproximately the same thing then it is
probably in here.
All functions expect to be called as package->function(param
list)
- listFromText
- Takes a piece of text, a seperator regex and a seperator that may appear
at the end. Returns an array of items that were seperated in the text by
that seperator. Takes care of null items (looses them for you).
- textFromList
- Takes an array of items, a separator, a terminating string, and a line
width. Returns an array of strings, each ending with the separator or the
terminator with a width less than or equal to the width specified.
Seems to do the wrong thing for references - not sure why.
Don't use it for that.
- wrapText
- Takes a string and a length. Returns an array of strings which are shorter
or equal in length to length, spliting the string on white space.
- wrapOn ($firstLinePrefix,
$linePrefix, $colums, $text[, @separators])
- Wraps $text into lines with at most
$colums colums. Prepends the prefixes to the
lines. @separators is a list of expressions on
which to wrap. The expression itself is part of the upper line.
If no @separators are provided, the
$text is wrapped at whitespace except in EC/TC
numbers or at dashes that separate words.
First tries to wrap on the first item of
@separators, then the next etc. If no wrap on
any element of @separators or whitespaces is
possible, wraps into lines of exactly length
$colums.
A special case is that the first item of
@separators may be a reference to an array. This
is used internally for wrapping FT VARIANT-like lines.
Example:
wrapOn('DE ', 'DE ', 40,
'14-3-3 PROTEIN BETA/ALPHA (PROTEIN KINASE C INHIBITOR PROTEIN-1)',
'\s+')
returns ['14-3-3 PROTEIN BETA/ALPHA (PROTEIN ',
'KINASE C INHIBITOR PROTEIN-1)']
wrapOn('DE ', 'DE ', 40,
'14-3-3 PROTEIN BETA/ALPHA (PROTEIN KINASE C INHIBITOR PROTEIN-1)',
' (?=\()', '\s+')
returns ['14-3-3 PROTEIN BETA/ALPHA ',
'(PROTEIN KINASE C INHIBITOR PROTEIN-1)']
- cleanLine
- Remove the leading line Identifier and three blanks and trailing spaces
from an SP line.
- joinWith ($text,
$with, $noAddAfter, @list)
- Concatenates $text and
@list into one string. Adds
$with between the original elements, unless the
postfix of the current string is $noAddAfter. This
is used to avoid inserting blanks after hyphens during concatenation. So
unpleasant strings like 'CALMODULIN- DEPENDENT' are avoided. Unfortunately
a correct reassembly of strings like 'CARBON-DIOXIDE' is not done.
- insertLineGroup
($textRef, $text, $pattern)
- Inserts text block $text into the text referred to
by $textRef. $text will
replace the text block in $textRef matched by
$pattern.
- uniqueList
(@list)
- Returns a list in which all duplicates from @list
have been removed.
- currentSpDate
- returns the current date in SWISS-PROT format
- toMixedCase($text,
@regexps)
- Convert a text to mixed case, according to one or more regular
expressions. In scalar context, returns the new text; in array context,
also returns the regexp with which the change was performed, or undef on
failure. See corresponding item in SWISS::GN for more details.