PATGEN(1) | General Commands Manual | PATGEN(1) |
patgen - generate patterns for TeX hyphenation
patgen dictionary_file pattern_file patout_file translate_file
This manual page is not meant to be exhaustive. See also the Info file or manual Web2C: A TeX implementation available as part of the TeX Live distribution or at http://tug.org/web2c.
The patgen program reads the dictionary_file containing a list of hyphenated words and the pattern_file containing previously-generated patterns (if any) for a particular language (not a complete TeX source file; see below), and produces the patout_file with (previously- plus newly-generated) hyphenation patterns for that language. The translate_file defines language specific values for the parameters left_hyphen_min and right_hyphen_min used by TeX's hyphenation algorithm and the external representation of the lower and upper case version(s) of all `letters' of that language. Further details of the pattern generation process such as hyphenation levels and pattern lengths are requested interactively from the user's terminal. Optionally patgen creates a new dictionary file pattmp.n showing the good and bad hyphens found by the generated patterns, where n is the highest hyphenation level.
The patterns generated by patgen can be read by initex for use in hyphenating words. For a real-life example of patgen's output, see $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains the patterns TeX uses for English by default. At some sites, patterns for (many) other languages may be available, and the local tex programs may have them preloaded.
All filenames must be complete; no adding of default extensions or path searching is done.
The hyphens in a word are indicated by `-', `*', or `.' (or their replacements as defined in the translate file) for hyphens yet to be found, `good' hyphens (correctly found by the patterns), and `bad' hyphens (erroneously found by the patterns) respectively; when reading a dictionary file `*' is treated like `-' and `.' is ignored.
% this is a pattern file read by TeX. \patterns{%It can only contain the actual patterns, i.e., the `...'.
... }
Each following line defines one `letter': an arbitrary delimiter character in column 1, followed by one or more external representations of that character (first the `lower' case one used for output), each one terminated by the delimiter and the whole sequence terminated by another delimiter.
If the translate file is empty, the values left_hyphen_min=2, right_hyphen_min=3, and the 26 lower case letters a...z with their upper case representations A...Z are assumed.
First the integer values of hyph_start and hyph_finish, the lowest and highest hyphenation level for which patterns are to be generated. The value of hyph_start should be larger than any hyphenation level already present in pattern_file.
Then, for each hyphenation level, the integer values of pat_start and pat_finish, the smallest and largest pattern length to be analyzed, as well as good weight, bad weight, and threshold, the weights for good and bad hyphens and a weight threshold for useful patterns.
Finally the decision (`y' or `Y' vs. anything else) whether or not to produce a hyphenated word list.
Frank Liang and Peter Breitenlohner, patgen.web.
Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford University Ph.D. thesis, 1983, http://tug.org/docs/liang.
Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.
Frank Liang wrote the first version of this program. Peter Breitenlohner made a substantial revision in 1991 for TeX 3. The first version was published as the appendix to the TeXware technical report. Howard Trickey originally ported it to Unix.
16 June 2015 | Web2C 2019/dev |