CATDVI(1) | General Commands Manual | CATDVI(1) |
catdvi - a DVI to plain text converter
catdvi [-d debuglevel, --debug=debuglevel] [-e outenc, --output-encoding=outenc] [-p pagespec, --first-page=pagespec] [-l pagespec, --last-page=pagespec] [-N, --list-page-numbers] [-s, --sequential] [-U, --show-unknown-glyphs] [-h, --help] [--version] [--copyright] [dvi-file]
This manual page documents catdvi version 0.14
catdvi reads the DVI (typesetter DeVice Independent) file dvi-file and dumps a plain text approximation of the document it describes to stdout. If the argument dvi-file is omitted or a dash (`-'), catdvi will read from stdin. Several output encodings (different character sets of the plain text output) are supported, most notably UTF-8.
The current version of catdvi is a work in progress; it may not be robust enough for production use, but already works fine with linear english text. Many mathematical symbols (e.g. the uppercase greek letters) and moderately complex formulae also come out right.
The program needs to read the TFM (Tex Font Metric) files corresponding to the fonts used in the DVI file. These are searched (and, if necessary and possible, created on the fly) through the Kpathsea library.
In order to correctly translate a DVI file to text, the input encoding of the fonts used in it (i.e. a meaning-preserving mapping from font code points to Unicode) must be known. There are a lot of different font encodings in use. At the time of writing, catdvi understands the following input encodings:
It is impossible to do perfect translation from unmarked-up DVI to plain text, since the former does only describe the layout of a page, and a translator such as this should really know where words and paragraphs end, and more importantly, which glyphs should be aligned vertically and which shouldn't. The current alignment algorithm tries to preserve the relative horizontal positions of word beginnings; this works well in most cases. Word breaks are detected using simple heuristics; paragraphs are not detected at all (and no paragraph fill is attempted).
The price of alignment is that the output will likely be more than 80 columns wide, even though catdvi tries very hard not to use more columns than strictly necessary. Output is usually less than 120 columns, almost always less than 132 columns wide. It may be a good idea to switch your terminal to one of these modes if possible.
The program follows the usual GNU command line syntax, with long options starting with two dashes.
A (possibly negative) number num specifies a TeX page number, which is stored as the so-called count0 value in the DVI file for every page. Plain TeX uses negative page numbers for roman-numbered frontmatter (title page, preface, TOC, etc.) so the count0 values compare as
A number prefixed by an equals sign (`=num') specifies a physical page, i.e. the num-th page appearing in the DVI file. Numbering starts with 1. Note that with the long form of the option you actually need two equals signs, one as part of the long option and one as part of the page specification. Example:
The third form of a page specification, two numbers separated by a colon (`num1:num2'), is useful for documents with separately-numbered parts, e.g. chapters. It refers to the page with count0 value equal to num2 that catdvi believes to be in part num1. Since those part numbers are not stored in the DVI file, the program has to guess them: an internal chapter counter is increased by one every time the count0 value of the current page is not greater (in above ordering) than that of the previous page. The counter is initialized to 1 if the first page has negative count0 value and to 0 otherwise. (A document with separately numbered parts will probably have separately numbered frontmatter as well, and then this rule keeps the internal counter equal to real world part numbers.)
The usual environment variables TFMFONTS, TEXFONTS, etc. for Kpathsea font search and creation apply. Refer to the Kpathsea documentation for details.
xdvi(1), dvips(1), tex(1), mktextfm(1), the Kpathsea texinfo documentation, utf-8(7).
These things do not work (yet):
Watch out for these:
catdvi was written by Antti-Juhani Kaijanaho <gaia@iki.fi>, based on a skeletal version by J.H.M. Dassen (Ray). Bjoern Brill <brill@fs.math.uni-frankfurt.de> did further improvements and currently maintains the program.
The manual page was compiled by Bjoern Brill, using material written by the first two program authors.
8 November 2002 |