PatMaN - search for approximate patterns in DNA libraries
patman [ option | file ...
]
PatMaN searches for (small) patterns in (huge) DNA
databases, allowing for some mismatches and optionally gaps.
Patterns and databases are read from one or more
fasta(5) files listed as non-option arguments, depending on whether
the -D or -P option last preceded them, and matched against
each other. The output of PatMaN is a table containing one line for
each match, consisting of tab-separated fields:
- name of database sequence,
- name of pattern,
- position of first matched base in database sequence, the sequence's
beginning has position 1,
- position of last matched base in database sequence,
- strand (+ for literal match, - for reverse complement),
- edit distance (number of mismatches plus number of gaps).
- -V, --version
- Print version number and exit.
- -e num, --edits num
- Allow up to num mismatches and/or gaps per match.
- -g num, --gaps num
- Allow up to num gaps per match. Note that gaps count as mismatches,
too, so the -e option should always be set at least as high as the
-g option. Allowing many gaps can incur a considerable
computational cost.
- -D, --databases
- Treat the following files as database. Databases must be in
fasta(5) format. Multiple database files, including
"-" for standard input, are allowed and are read in turn.
- -P, --patterns
- Treat the following files as patterns. Pattern files must be in
fasta(5) format. Multiple pattern files, including
"-" for standard input, are allowed and are all read before
touching the databases.
- -o file, --output file
- Redirect output to file. The file name "-" causes output
to be written to stdout, which is also the default
- -a, --ambicodes
- Activate the interpretation of ambiguity codes in patterns. This results
in the expansion of any pattern with ambiguity codes into multiple
patterns which can match independently. Compare Unknown Nucleotides
below.
- -s, --singlestrand
- Deactivate matching of reverse-complements. Normally, PatMaN will
try to match patterns both literally and after reverse-complementing them,
with this option set, only straight forward matches are considered.
- -p num, --prefetch num
- Causes num pointers to be prefetched in advance. This feature can
improve performance, if PatMaN has been compiled for a processor
architecture that supports prefetching. The optimum value for your
particular setup has to be determined empirically, but the default should
be reasonably good.
- -l len, --min-length len
- Only consider patterns with a length of at least len. Use this if
your pattern collection contains short sequences that you don't
want lots of possible matches reported for.
- -x num, --chop3 num
- Cut off num bases from the 3' end of each pattern. Use this
for patterns with damaged, edited, etc. 3' ends that should be
ignored. The chopped bases are neither matched nor included in the
reported match regions.
- -X num, --chop5 num
- Cut off num bases from the 5' end of each pattern. Use this
for patterns with damaged, edited, etc. 5' ends that should be
ignored. The chopped bases are neither matched nor included in the
reported match regions.
- -A, --adenine-hack
- Allow adenine to be ignored in patterns. This is essentially equivalent to
not counting gaps in the database, as long as it was an A that was
gapped. Using -A can be computationally extremely expensive, both
in terms of memory and time consumed.
- -q, --quiet
- Suppress warnings (about unrecognized characters in input sequences or
missing input files). Even without -q, at most one such warning is
given per run.
- -v, --verbose
- Prints additional progress information to stderr.
- -d flags, --debug flags
- Sets debugging flags to flags.Flags may be the logical
OR of any of the following values, each of which causes some output
to appear on stderr. Some of the values may only work if
PatMaN has been compiled in debug mode. The default value is 1.
- 1
- Print warnings. Equivalent to not setting -q.
- 2
- Print progress information. Equivalent to setting -v.
- 4
- Dump the suffix trie of the patterns. Only available in debug
build.
- 8
- Count number of visited nodes and print that number in each iteration.
Only available in debug build.
- 16
- Print total number of nodes fetched from memory after completing all
databases.
- 32
- Output database sequence while it is being matched.
Non-option arguments (bare filenames) are either treated as
database or pattern files, depending on whether the -D
or -P option was the the last that occurred before the filename. If
neither -D nor -P was given, file names are treated as
pattern files. If no database was given, it is instead read
from standard input. Standard input can be explicitly given as either a
database or a pattern file by using the filename
"-". A warning is given if standard input is selected implicitly
as database, an error message is given if no pattern files
have been named at all.
Allowing gaps often causes overlapping matches of single
patterns at almost the same position. PatMaN makes no attempt
to filter these redundant matches. Also note that allowing many gaps, and
especially allowing an arbitrary amount of gaps through the -A hack
can slow down PatMaN considerably and cause it to produce enormous
amounts of output. The use of some sorty of post-processor to filter these
is highly recommended.
Unknown nucleotides are most often encoded by the letter N.
If the --ambicodes option is not given, Ns in patterns are
interpreted as unknown nucleotides and can never match without penalty. If
--ambicodes is given, Ns in patterns are expanded just like
the other amibuguity codes, and effectively work as wildcards. Unknown
nucleotides can still be encoded by an X and will never match
anything. The database is treated differently in that anything other than
A, C, G, T and U, including ambiguity
codes, is treated as unknown and can never match without penalty.
/etc/popt
The system wide configuration file for
popt(3).
PatMaN identifies itself as "patman" to popt.
~/.popt
Per user configuration file for
popt(3).
Kay Pruefer <pruefer@eva.mpg.de>
Udo Stenzel <udo_stenzel@eva.mpg.de>