DOKK / manpages / debian 12 / patman / patman.1.en
PATMAN(1) User Manuals PATMAN(1)

PatMaN - search for approximate patterns in DNA libraries

patman [ option | file ... ]

PatMaN searches for (small) patterns in (huge) DNA databases, allowing for some mismatches and optionally gaps. Patterns and databases are read from one or more fasta(5) files listed as non-option arguments, depending on whether the -D or -P option last preceded them, and matched against each other. The output of PatMaN is a table containing one line for each match, consisting of tab-separated fields:

  • name of database sequence,
  • name of pattern,
  • position of first matched base in database sequence, the sequence's beginning has position 1,
  • position of last matched base in database sequence,
  • strand (+ for literal match, - for reverse complement),
  • edit distance (number of mismatches plus number of gaps).

Print version number and exit.

Allow up to num mismatches and/or gaps per match.

Allow up to num gaps per match. Note that gaps count as mismatches, too, so the -e option should always be set at least as high as the -g option. Allowing many gaps can incur a considerable computational cost.

Treat the following files as database. Databases must be in fasta(5) format. Multiple database files, including "-" for standard input, are allowed and are read in turn.

Treat the following files as patterns. Pattern files must be in fasta(5) format. Multiple pattern files, including "-" for standard input, are allowed and are all read before touching the databases.

Redirect output to file. The file name "-" causes output to be written to stdout, which is also the default

Activate the interpretation of ambiguity codes in patterns. This results in the expansion of any pattern with ambiguity codes into multiple patterns which can match independently. Compare Unknown Nucleotides below.

Deactivate matching of reverse-complements. Normally, PatMaN will try to match patterns both literally and after reverse-complementing them, with this option set, only straight forward matches are considered.

Causes num pointers to be prefetched in advance. This feature can improve performance, if PatMaN has been compiled for a processor architecture that supports prefetching. The optimum value for your particular setup has to be determined empirically, but the default should be reasonably good.

Only consider patterns with a length of at least len. Use this if your pattern collection contains short sequences that you don't want lots of possible matches reported for.

Cut off num bases from the 3' end of each pattern. Use this for patterns with damaged, edited, etc. 3' ends that should be ignored. The chopped bases are neither matched nor included in the reported match regions.

Cut off num bases from the 5' end of each pattern. Use this for patterns with damaged, edited, etc. 5' ends that should be ignored. The chopped bases are neither matched nor included in the reported match regions.

Allow adenine to be ignored in patterns. This is essentially equivalent to not counting gaps in the database, as long as it was an A that was gapped. Using -A can be computationally extremely expensive, both in terms of memory and time consumed.

Suppress warnings (about unrecognized characters in input sequences or missing input files). Even without -q, at most one such warning is given per run.

Prints additional progress information to stderr.

Sets debugging flags to flags.Flags may be the logical OR of any of the following values, each of which causes some output to appear on stderr. Some of the values may only work if PatMaN has been compiled in debug mode. The default value is 1.

1
Print warnings. Equivalent to not setting -q.

2
Print progress information. Equivalent to setting -v.

4
Dump the suffix trie of the patterns. Only available in debug build.

8
Count number of visited nodes and print that number in each iteration. Only available in debug build.

16
Print total number of nodes fetched from memory after completing all databases.

32
Output database sequence while it is being matched.

Non-option arguments (bare filenames) are either treated as database or pattern files, depending on whether the -D or -P option was the the last that occurred before the filename. If neither -D nor -P was given, file names are treated as pattern files. If no database was given, it is instead read from standard input. Standard input can be explicitly given as either a database or a pattern file by using the filename "-". A warning is given if standard input is selected implicitly as database, an error message is given if no pattern files have been named at all.

Allowing gaps often causes overlapping matches of single patterns at almost the same position. PatMaN makes no attempt to filter these redundant matches. Also note that allowing many gaps, and especially allowing an arbitrary amount of gaps through the -A hack can slow down PatMaN considerably and cause it to produce enormous amounts of output. The use of some sorty of post-processor to filter these is highly recommended.

Unknown nucleotides are most often encoded by the letter N. If the --ambicodes option is not given, Ns in patterns are interpreted as unknown nucleotides and can never match without penalty. If --ambicodes is given, Ns in patterns are expanded just like the other amibuguity codes, and effectively work as wildcards. Unknown nucleotides can still be encoded by an X and will never match anything. The database is treated differently in that anything other than A, C, G, T and U, including ambiguity codes, is treated as unknown and can never match without penalty.

/etc/popt

The system wide configuration file for popt(3). PatMaN identifies itself as "patman" to popt.

~/.popt

Per user configuration file for popt(3).

None known.

Kay Pruefer <pruefer@eva.mpg.de>
Udo Stenzel <udo_stenzel@eva.mpg.de>

popt(3),fasta(5)

JANUARY 2008 Applications