docstrip_util - Docstrip-related utilities
package require Tcl 8.4
package require docstrip ?1.2?
package require docstrip::util ?1.3.1?
pkgProvide name version terminals
pkgIndex ?terminal ...?
fileoptions ?option value ...?
docstrip::util::index_from_catalogue dir
pattern ?option value ...?
docstrip::util::modules_from_catalogue target
source ?option value ...?
docstrip::util::classical_preamble metaprefix
message target ?source terminals ...?
docstrip::util::classical_postamble metaprefix
message target ?source terminals ...?
docstrip::util::packages_provided text
?setup-script?
docstrip::util::ddt2man text
docstrip::util::guards subcmd text
docstrip::util::patch source-var terminals
fromtext diff ?option value ...?
docstrip::util::thefile filename ?option
value ...?
docstrip::util::import_unidiff diff-text
?warning-var?
The docstrip::util package is meant for collecting various
utility procedures that are mainly useful at installation or development
time. It is separate from the base package to avoid overhead when the latter
is used to source code.
Like raw ".tcl" files, code lines in docstrip
source files can be searched for package declarations and corresponding
indices constructed. A complication is however that one cannot tell from the
code blocks themselves which will fit together to make a working package;
normally that information would be found in an accompanying
".ins" file, but parsing one of those is not an easy task.
Therefore docstrip::util introduces an alternative encoding of such
information, in the form of a declarative Tcl script: the catalogue
(of the contents in a source file).
The special commands which are available inside a catalogue
are:
- pkgProvide
name version terminals
- Declares that the code for a package with name name and version
version is made up from those modules in the source file which are
selected by the terminals list of guard expression terminals. This
code should preferably not contain a package provide command
for the package, as one will be provided by the package loading
mechanisms.
- pkgIndex
?terminal ...?
- Declares that the code for a package is made up from those modules in the
source file which are selected by the listed guard expression
terminals. The name and version of this package is determined from
package provide command(s) found in that code (hence there
must be such a command in there).
- fileoptions
?option value ...?
- Declares the fconfigure options that should be in force when
reading the source; this can usually be ignored for pure ASCII files, but
if the file needs to be interpreted according to some other
-encoding then this is how to specify it. The command should
normally appear first in the catalogue, as it takes effect only for
commands following it.
Other Tcl commands are supported too — a catalogue is
parsed by being evaluated in a safe interpreter — but they are rarely
needed. To allow for future extensions, unknown commands in the catalogue
are silently ignored.
To simplify distribution of catalogues together with their source
files, the catalogue is stored in the source file itself as a module
selected by the terminal 'docstrip.tcl::catalogue'. This supports
both the style of collecting all catalogue lines in one place and the style
of putting each catalogue line in close proximity of the code that it
declares.
Putting catalogue entries next to the code they declare may look
as follows
% First there's the catalogue entry
% \begin{tcl}
%<docstrip.tcl::catalogue>pkgProvide foo::bar 1.0 {foobar load}
% \end{tcl}
% second a metacomment used to include a copyright message
% \begin{macrocode}
%<*foobar>
%% This file is placed in the public domain.
% \end{macrocode}
% third the package implementation
% \begin{tcl}
namespace eval foo::bar {
# ... some clever piece of Tcl code elided ...
% \end{tcl}
% which at some point may have variant code to make use of a
% |load|able extension
% \begin{tcl}
%<*load>
load [file rootname [info script]][info sharedlibextension]
%</load>
%<*!load>
# ... even more clever scripted counterpart of the extension
# also elided ...
%</!load>
}
%</foobar>
% \end{tcl}
% and that's it!
The corresponding set-up with pkgIndex would be
% First there's the catalogue entry
% \begin{tcl}
%<docstrip.tcl::catalogue>pkgIndex foobar load
% \end{tcl}
% second a metacomment used to include a copyright message
% \begin{tcl}
%<*foobar>
%% This file is placed in the public domain.
% \end{tcl}
% third the package implementation
% \begin{tcl}
package provide foo::bar 1.0
namespace eval foo::bar {
# ... some clever piece of Tcl code elided ...
% \end{tcl}
% which at some point may have variant code to make use of a
% |load|able extension
% \begin{tcl}
%<*load>
load [file rootname [info script]][info sharedlibextension]
%</load>
%<*!load>
# ... even more clever scripted counterpart of the extension
# also elided ...
%</!load>
}
%</foobar>
% \end{tcl}
% and that's it!
- docstrip::util::index_from_catalogue
dir pattern ?option value ...?
- This command is a sibling of the standard pkg_mkIndex command, in
that it adds package entries to "pkgIndex.tcl" files. The
difference is that it indexes docstrip-style source files rather
than raw ".tcl" or loadable library files. Only packages
listed in the catalogue of a file are considered.
The dir argument is the directory in which to look for
files (and whose "pkgIndex.tcl" file should be
amended). The pattern argument is a glob pattern of files
to look into; a typical value would be *.dtx or
*.{dtx,ddt}. Remaining arguments are option-value pairs, where
the supported options are:
- -recursein
dirpattern
- If this option is given, then the index_from_catalogue operation
will be repeated in each subdirectory whose name matches the
dirpattern. -recursein * will cause the entire
subtree rooted at dir to be indexed.
- -sourceconf
dictionary
- Specify fileoptions to use when reading the catalogues of files
(and also for reading the packages if the catalogue does not contain a
fileoptions command). Defaults to being empty. Primarily useful if
your system encoding is very different from that of the source file (e.g.,
one is a two-byte encoding and the other is a one-byte encoding).
ascii and utf-8 are not very different in that sense.
- -options
terminals
- The terminals is a list of terminals in addition to
docstrip.tcl::catalogue that should be held as true when extracting
the catalogue. Defaults to being empty. This makes it possible to make use
of "variant sections" in the catalogue itself, e.g. gaurd some
entries with an extra "experimental" and thus prevent them from
appearing in the index unless that is generated with
"experimental" among the -options.
- -report
boolean
- If the boolean is true then the return value will be a textual,
probably multiline, report on what was done. Defaults to false, in which
case there is no particular return value.
- -reportcmd
commandPrefix
- Every item in the report is handed as an extra argument to the command
prefix. Since index_from_catalogue would typically be used at a
rather high level in installation scripts and the like, the
commandPrefix defaults to "puts stdout".
Use list to effectively disable this feature. The return values
from the prefix are ignored.
- The package ifneeded scripts that are generated contain one
package require docstrip command and one
docstrip::sourcefrom command. If the catalogue entry was of the
pkgProvide kind then the package ifneeded script also
contains the package provide command.
Note that index_from_catalogue never removes anything
from an existing "pkgIndex.tcl" file. Hence you may
need to delete it (or have pkg_mkIndex recreate it from scratch)
before running index_from_catalogue to update some piece of
information, such as a package version number.
- docstrip::util::modules_from_catalogue
target source ?option value ...?
- This command is an alternative to index_from_catalogue which
creates Tcl Module (".tm") files rather than
"pkgIndex.tcl" entries. Since this action is more similar
to what docstrip classically does, it has features for putting pre-
and postambles on the generated files.
The source argument is the name of the source file to
generate ".tm" files from. The target argument
is the directory which should count as a module path, i.e., this is what
the relative paths derived from package names are joined to. The
supported options are:
- -preamble
message
- A message to put in the preamble (initial block of comments) of generated
files. Defaults to a space. May be several lines, which are then separated
by newlines. Traditionally used for copyright notices or the like, but
metacomment lines provide an alternative to that.
- -postamble
message
- Like -preamble, but the message is put at the end of the file
instead of the beginning. Defaults to being empty.
- -sourceconf
dictionary
- Specify fileoptions to use when reading the catalogue of the
source (and also for reading the packages if the catalogue does not
contain a fileoptions command). Defaults to being empty. Primarily
useful if your system encoding is very different from that of the source
file (e.g., one is a two-byte encoding and the other is a one-byte
encoding). ascii and utf-8 are not very different in that
sense.
- -options
terminals
- The terminals is a list of terminals in addition to
docstrip.tcl::catalogue that should be held as true when extracting
the catalogue. Defaults to being empty. This makes it possible to make use
of "variant sections" in the catalogue itself, e.g. gaurd some
entries with an extra "experimental" guard and thus prevent them
from contributing packages unless those are generated with
"experimental" among the -options.
- -formatpreamble
commandPrefix
- Command prefix used to actually format the preamble. Takes four additional
arguments message, targetFilename, sourceFilename,
and terminalList and returns a fully formatted preamble. Defaults
to using classical_preamble with a metaprefix of '##'.
- -formatpostamble
commandPrefix
- Command prefix used to actually format the postamble. Takes four
additional arguments message, targetFilename,
sourceFilename, and terminalList and returns a fully
formatted postamble. Defaults to using classical_postamble with a
metaprefix of '##'.
- -report
boolean
- If the boolean is true (which is the default) then the return value
will be a textual, probably multiline, report on what was done. If it is
false then there is no particular return value.
- -reportcmd
commandPrefix
- Every item in the report is handed as an extra argument to this command
prefix. Defaults to list, which effectively disables this feature.
The return values from the prefix are ignored. Use for example
"puts stdout" to get report items written
immediately to the terminal.
- An existing file of the same name as one to be created will be
overwritten.
- docstrip::util::classical_preamble
metaprefix message target ?source
terminals ...?
- This command returns a preamble in the classical docstrip
style
##
## This is `TARGET',
## generated by the docstrip::util package.
##
## The original source files were:
##
## SOURCE (with options: `foo,bar')
##
## Some message line 1
## line2
## line3
- if called as
docstrip::util::classical_preamble {##}\
"\nSome message line 1\nline2\nline3" TARGET SOURCE {foo bar}
- The command supports preambles for files generated from multiple sources,
even though modules_from_catalogue at present does not need
that.
- docstrip::util::classical_postamble
metaprefix message target ?source
terminals ...?
- This command returns a postamble in the classical docstrip
style
## Some message line 1
## line2
## line3
##
## End of file `TARGET'.
- if called as
docstrip::util::classical_postamble {##}\
"Some message line 1\nline2\nline3" TARGET SOURCE {foo bar}
- In other words, the source and terminals arguments are
ignored, but supported for symmetry with classical_preamble.
- docstrip::util::packages_provided
text ?setup-script?
- This command returns a list where every even index element is the name of
a package provided by text when that is evaluated as a Tcl
script, and the following odd index element is the corresponding version.
It is used to do package indexing of extracted pieces of code, in the
manner of pkg_mkIndex.
One difference to pkg_mkIndex is that the text
gets evaluated in a safe interpreter. package require commands
are silently ignored, as are unknown commands (which includes
source and load). Other errors cause processing of the
text to stop, in which case only those package declarations that
had been encountered before the error will be included in the return
value.
The setup-script argument can be used to customise the
evaluation environment, if the code in text has some very special
needs. The setup-script is evaluated in the local context of the
packages_provided procedure just before the text is
processed. At that time, the name of the slave command for the safe
interpreter that will do this processing is kept in the local variable
c. To for example copy the contents of the ::env array to
the safe interpreter, one might use a setup-script of
$c eval [list array set env [array get ::env]]
Unlike the previous group of commands, which would use
docstrip::extract to extract some code lines and then process those
further, the following commands operate on text consisting of all types of
lines.
- docstrip::util::ddt2man
text
- The ddt2man command reformats text from the general
docstrip format to doctools ".man" format
(Tcl Markup Language for Manpages). The different line types are treated
as follows:
- The '%' and '%%' prefixes are removed, the rest of the text is kept as it
is.
- empty lines
- These are kept as they are. (Effectively this means that they will count
as comment lines after a comment line and as code lines after a code
line.)
- code lines
- example_begin and example_end commands are placed at the
beginning and end of every block of consecutive code lines. Brackets in a
code line are converted to lb and rb commands.
- verbatim
guards
- These are processed as usual, so they do not show up in the result but
every line in a verbatim block is treated as a code line.
- other guards
- These are treated as code lines, except that the actual guard is
emphasised.
- At the time of writing, no project has employed doctools markup in
master source files, so experience of what works well is not available. A
source file could however look as follows
% [manpage_begin gcd n 1.0]
% [keywords divisor]
% [keywords math]
% [moddesc {Greatest Common Divisor}]
% [require gcd [opt 1.0]]
% [description]
%
% [list_begin definitions]
% [call [cmd gcd] [arg a] [arg b]]
% The [cmd gcd] procedure takes two arguments [arg a] and [arg b] which
% must be integers and returns their greatest common divisor.
proc gcd {a b} {
% The first step is to take the absolute values of the arguments.
% This relieves us of having to worry about how signs will be treated
% by the remainder operation.
set a [expr {abs($a)}]
set b [expr {abs($b)}]
% The next line does all of Euclid's algorithm! We can make do
% without a temporary variable, since $a is substituted before the
% [lb]set a $b[rb] and thus continues to hold a reference to the
% "old" value of [var a].
while {$b>0} { set b [expr { $a % [set a $b] }] }
% In Tcl 8.3 we might want to use [cmd set] instead of [cmd return]
% to get the slight advantage of byte-compilation.
%<tcl83> set a
%<!tcl83> return $a
}
% [list_end]
%
% [manpage_end]
- If the above text is fed through docstrip::util::ddt2man then the
result will be a syntactically correct doctools manpage, even
though its purpose is a bit different.
It is suggested that master source code files with
doctools markup are given the suffix ".ddt",
hence the "ddt" in ddt2man.
- docstrip::util::guards
subcmd text
- The guards command returns information (mostly of a statistical
nature) about the ordinary docstrip guards that occur in the text.
The subcmd selects what is returned.
- counts
- List the guard expression terminals with counts. The format of the return
value is a dictionary which maps the terminal name to the number of
occurencies of it in the file.
- exprcount
- List the guard expressions with counts. The format of the return value is
a dictionary which maps the expression to the number of occurencies of it
in the file.
- exprerr
- List the syntactically incorrect guard expressions (e.g. parentheses do
not match, or a terminal is missing). The return value is a list, with the
elements in no particular order.
- expressions
- List the guard expressions. The return value is a list, with the elements
in no particular order.
- exprmods
- List the guard expressions with modifiers. The format of the return value
is a dictionary where each index is a guard expression and each entry is a
string with one character for every guard line that has this expression.
The characters in the entry specify what modifier was used in that line:
+, -, *, /, or (for guard without modifier:) space. This is the most
primitive form of the information gathered by guards.
- names
- List the guard expression terminals. The return value is a list, with the
elements in no particular order.
- rotten
- List the malformed guard lines (this does not include lines where only the
expression is malformed, though). The format of the return value is a
dictionary which maps line numbers to their contents.
- docstrip::util::patch
source-var terminals fromtext diff
?option value ...?
- This command tries to apply a diff file (for example a contributed
patch) that was computed for a generated file to the docstrip
source. This can be useful if someone has edited a generated file, thus
mistaking it for being the source. This command makes no presumptions
which are specific for the case that the generated file is a Tcl script.
patch requires that the source file to patch is kept as
a list of lines in a variable, and the name of that variable in the
calling context is what goes into the source-var argument. The
terminals is the list of terminals used to extract the file that
has been patched. The diff is the actual diff to apply (in a
format as explained below) and the fromtext is the contents of
the file which served as "from" when the diff was computed.
Options can be used to further control the process.
The process works by "lifting" the hunks in the
diff from generated to source file, and then applying them to the
elements of the source-var. In order to do this lifting, it is
necessary to determine how lines in the fromtext correspond to
elements of the source-var, and that is where the
terminals come in; the source is first extracted under the
given terminals, and the result of that is then matched against
the fromtext. This produces a map which translates line numbers
stated in the diff to element numbers in source-var, which
is what is needed to lift the hunks.
The reason that both the terminals and the
fromtext must be given is twofold. First, it is very difficult to
keep track of how many lines of preamble are supplied some other way
than by copying lines from source files. Second, a generated file might
contain material from several source files. Both make it impossible to
predict what line number an extracted file would have in the generated
file, so instead the algorithm for computing the line number map looks
for a block of lines in the fromtext which matches what can be
extracted from the source. This matching is affected by the following
options:
- -matching
mode
- How equal must two lines be in order to match? The supported modes
are:
- exact
- Lines must be equal as strings. This is the default.
- anyspace
- All sequences of whitespace characters are converted to single spaces
before comparing.
- nonspace
- Only non-whitespace characters are considered when comparing.
- none
- Any two lines are considered to be equal.
- -metaprefix
string
- The -metaprefix value to use when extracting. Defaults to
"%%", but for Tcl code it is more likely that "#" or
"##" had been used for the generated file.
- -trimlines
boolean
- The -trimlines value to use when extracting. Defaults to true.
- The return value is in the form of a unified diff, containing only those
hunks which were not applied or were only partially applied; a comment in
the header of each hunk specifies which case is at hand. It is normally
necessary to manually review both the return value from patch and
the patched text itself, as this command cannot adjust comment lines to
match new content.
An example use would look like
set sourceL [split [docstrip::util::thefile from.dtx] \n]
set terminals {foo bar baz}
set fromtext [docstrip::util::thefile from.tcl]
set difftext [exec diff --unified from.tcl to.tcl]
set leftover [docstrip::util::patch sourceL $terminals $fromtext\
[docstrip::util::import_unidiff $difftext] -metaprefix {#}]
set F [open to.dtx w]; puts $F [join $sourceL \n]; close $F
return $leftover
- Here, "from.dtx" was used as source for
"from.tcl", which someone modified into
"to.tcl". We're trying to construct a
"to.dtx" which can be used as source for
"to.tcl".
- docstrip::util::thefile
filename ?option value ...?
- The thefile command opens the file filename, reads it to
end, closes it, and returns the contents (dropping a final newline if
there is one). The option-value pairs are passed on to fconfigure
to configure the open file channel before anything is read from it.
- docstrip::util::import_unidiff
diff-text ?warning-var?
- This command parses a unified (diff flags -U and
--unified) format diff into the list-of-hunks format expected by
docstrip::util::patch. The diff-text argument is the text to
parse and the warning-var is, if specified, the name in the calling
context of a variable to which any warnings about parsing problems will be
appended.
The return value is a list of hunks. Each hunk is a
list of five elements "start1 end1 start2
end2 lines". start1 and end1 are line
numbers in the "from" file of the first and last respectively
lines of the hunk. start2 and end2 are the corresponding
line numbers in the "to" file. Line numbers start at 1. The
lines is a list with two elements for each line in the hunk; the
first specifies the type of a line and the second is the actual line
contents. The type is - for lines only in the "from"
file, + for lines that are only in the "to" file, and
0 for lines that are in both.
docstrip, doctools, doctools_fmt
\.ddt, .dtx, LaTeX, Tcl module, catalogue, diff, docstrip,
doctools, documentation, literate programming, module, package indexing,
patch, source
Copyright (c) 2003–2010 Lars Hellström <Lars dot Hellstrom at residenset dot net>