Text::Reflow - Perl module for reflowing text files using Knuth's
paragraphing algorithm.
use Text::Reflow qw(reflow_file reflow_string reflow_array);
reflow_file($infile, $outfile, key => value, ...);
$output = reflow_string($input, key => value, ...);
$output = reflow_array(\@input, key => value, ...);
These routines will reflow the paragraphs in the given file,
filehandle, string or array using Knuth's paragraphing algorithm (as used in
TeX) to pick "good" places to break the lines.
Each routine takes ascii text data with paragraphs separated by
blank lines and reflows the paragraphs. If two or more lines in a row are
"indented" then they are assumed to be a quoted poem and are
passed through unchanged (but see below)
The reflow algorithm tries to keep the lines the same length but
also tries to break at punctuation, and avoid breaking within a proper name
or after certain connectives ("a", "the", etc.).
The result is a file with a more "ragged" right margin than is
produced by "fmt" or
"Text::Wrap" but it is easier to read
since fewer phrases are broken across line breaks.
For "reflow_file", if
$infile is the empty string, then the input is taken
from STDIN and if $outfile is the empty string, the
output is written to STDOUT. Otherwise, $infile and
$outfile may be a string, a FileHandle reference or
a FileHandle glob.
A typical invocation is:
reflow_file("myfile", "");
which reflows the whole of myfile and prints the result to
STDOUT.
KEYWORD OPTIONS
The behaviour of Reflow can be adjusted by setting various keyword
options. These can be set globally by referencing the appropriate variable
in the Text::Reflow package, for example:
$Text::Reflow::maximum = 80;
$Text::Reflow::optimum = 75;
will set the maximum line length to 80 characters and the optimum
line length to 75 characters for all subsequent reflow operations. Or they
can be passed to a reflow_ function as a keyword parameter, for example:
$out = reflow_string($in, maximum => 80, optimum => 75);
in which case the new options only apply to this call.
The following options are currently implemented, with their
default values:
- optimum =>
[65]
- The optimum line length in characters. This can be either a number or a
reference to an array of numbers: in the latter case, each optimal line
length is tried in turn for each paragraph, and the one which leads to the
best overall paragraph is chosen. This results in less ragged paragraphs,
but some paragraphs will be wider or narrower overall than others.
- maximum =>
75
- The maximum allowed line length.
- indent =>
""
- Each line of output has this string prepended.
"indent => string" is equivalent to
"indent1 => string, indent2 =>
string".
- indent1 =>
""
- A string which is used to indent the first line in any paragraph.
- indent2 =>
""
- A string which is used to indent the second and subsequent line in any
paragraph.
- quote =>
""
- Characters to strip from the beginning of a line before processing. To
reflow a quoted email message and then restore the quotes you might want
to use
quote => "> ", indent => "> "
- skipto =>
""
- Skip to the first line starting with the given pattern before starting to
reflow. This is useful for skipping Project Gutenberg headers or contents
tables.
- skipindented
=> 2
- If "skipindented" = 0 then all indented
lines are flowed in with the surrounding paragraph. If
"skipindented" = 1 then any indented
line will not be reflowed. If
"skipindented" = 2 then any two or more
adjacent indented lines will not be reflowed. The purpose of the default
value is to allow poetry to pass through unchanged, but not to allow a
paragraph indentation from preventing the first line of the paragraph from
being reflowed.
- noreflow =>
""
- A pattern to indicate that certain lines should not be reflowed. For
example, a table of contents might have a line of dots. The option:
noreflow => '(\.\s*){4}\.'
will not reflow any lines containing five or more consecutive
dots.
- frenchspacing
=> 'n'
- Normally two spaces are put at the end of a sentence or a clause. The
"frenchspacing" option (taken from the
TeX macro of the same name) disables this feature.
- oneparagraph
=> 'n'
- Set this to 'y' if you want the whole input to be flowed into a single
paragraph, ignoring blank lines in the input.
- semantic =>
30
- This parameter indicates the extent to which semantic factors matter
(breaking on punctuation, avoiding a break within a clause etc.). Set this
to zero to minimise the raggedness of the right margin, at the expense of
readability.
- namebreak =>
10
- Penalty for splitting up a name
- sentence =>
20
- Penalty for sentence widows and orphans (ie splitting a line immediately
after the first word in a sentence, or before the last word in a
sentence)
- independent
=> 10
- Penalty for independent clause widows and orphans.
- dependent =>
6
- Penalty for dependent clause widows and orphans.
- shortlast =>
5
- Penalty for a short last line in a paragraph (one or two words).
- connpenalty
=> 1
- Multiplier for the "negative penalty" for breaking at a
connective. In other words, increasing this value makes connectives an
even more attractive place to break a line.
Original "reflow" perl script
written by Michael Larsen, larsen@edu.upenn.math.
Modified, enhanced and converted to a perl module with XSUB by
Martin Ward, martin@gkc.org.uk
perl(1).
See "TeX the Program" by Donald Knuth for a description
of the algorithm used.