Locale::Po4a::Sgml - convert SGML documents from/to PO files
The po4a (PO for anything) project goal is to ease translations
(and more interestingly, the maintenance of translations) using gettext
tools on areas where they were not expected like documentation.
Locale::Po4a::Sgml is a module to help the translation of
documentation in the SGML format into other [human] languages.
This module uses onsgmls(1) to parse the SGML files. Make
sure it is installed. Also make sure that the DTD of the SGML files are
installed in the system.
- debug
- Space separated list of keywords indicating which part you want to debug.
Possible values are: tag, generic, entities and refs.
- verbose
- Give more information about what's going on.
- translate
- Space separated list of extra tags (beside the DTD provided ones) whose
content should form an extra msgid.
- section
- Space separated list of extra tags (beside the DTD provided ones)
containing other tags, some of them being of category
translate.
- indent
- Space separated list of tags which increase the indentation level.
- verbatim
- The layout within those tags should not be changed. The paragraph won't
get wrapped, and no extra indentation space or new line will be added for
cosmetic purpose.
- empty
- Tags not needing to be closed.
- ignore
- Tags ignored and considered as plain char data by po4a. That is to say
that they can be part of an msgid. For example, <b> is a good
candidate for this category since putting it in the translate section
would create msgids not being whole sentences, which is bad.
- attributes
- A space separated list of attributes that need to be translated. You can
specify the attributes by their name (for example, "lang"), but
you can also prefix it with a tag hierarchy, to specify that this
attribute will only be translated when it is into the specified tag. For
example: <bbb><aaa>lang specifies that the lang attribute will
only be translated if it is in an <aaa> tag, which is in a
<bbb> tag. The tag names are actually regular expressions so you can
also write things like <aaa|bbbb>lang to only translate lang
attributes that are in an <aaa> or a <bbb> tag.
- qualify
- A space separated list of attributes for which the translation must be
qualified by the attribute name. Note that this setting automatically adds
the given attribute into the 'attributes' list too.
- force
- Proceed even if the DTD is unknown or if onsgmls finds errors in the input
file.
- include-all
- By default, msgids containing only one entity (like '&version;') are
skipped for the translator comfort. Activating this option prevents this
optimisation. It can be useful if the document contains a construction
like "<title>Á</title>", even if I doubt
such things to ever happen...
- ignore-inclusion
- Space separated list of entities that won't be inlined. Use this option
with caution: it may cause onsgmls (used internally) to add tags and
render the output document invalid.
The result is perfect. I.e., the generated documents are exactly
the same. But there are still some problems:
- The error output of onsgmls is redirected to /dev/null by default, which
is clearly bad. I don't know how to avoid that.
The problem is that I have to "protect" the
conditional inclusions (i.e. the "<! [ %foo
[" and "]]>" stuff)
from onsgmls. Otherwise onsgmls eats them, and I don't know how to
restore them in the final document. To prevent that, I rewrite them to
"{PO4A-beg-foo}" and
"{PO4A-end}".
The problem with this is that the
"{PO4A-end}" and such I add are
invalid in the document (not in a <p> tag or so).
If you want to view the onsgmls output, just add the following
to your command line (or po4a configuration line):
-o debug=onsgmls
- It does work only with the DebianDoc and DocBook DTD. Adding support for a
new DTD should be very easy. The mechanism is the same for every DTD, you
just have to give a list of the existing tags and some of their
characteristics.
I agree, this needs some more documentation, but it is still
considered as beta, and I hate to document stuff which may/will
change.
- Warning, support for DTDs is quite experimental. I did not read any
reference manual to find the definition of every tag. I did add tag
definition to the module 'till it works for some documents I found on the
net. If your document use more tags than mine, it won't work. But as I
said above, fixing that should be quite easy.
I did test DocBook against the SAG (System Administrator
Guide) only, but this document is quite big, and should use most of the
DocBook specificities.
For DebianDoc, I tested some of the manuals from the DDP, but
not all yet.
- In case of file inclusion, string reference of messages in PO files (i.e.
lines like "#: en/titletoc.sgml:9460")
will be wrong.
This is because I preprocess the file to protect the
conditional inclusion (i.e. the "<! [ %foo
[" and "]]>" stuff) and
some entities (like &version;) from onsgmls because I want them
verbatim to the generated document. For that, I make a temp copy of the
input file and do all the changes I want to this before passing it to
onsgmls for parsing.
So that it works, I replace the entities asking for a file
inclusion by the content of the given file (so that I can protect what
needs to be in a subfile also). But nothing is done so far to correct
the references (i.e., filename and line number) afterward. I'm not sure
what the best thing to do is.
This module is an adapted version of sgmlspl (SGML postprocessor
for the ONSGMLS parser) which was:
Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>
The adaptation for po4a was done by:
Denis Barbier <barbier@linuxfr.org>
Martin Quinson (mquinson#debian.org)
Copyright © 1995 David Megginson <dmeggins@aix1.uottawa.ca>.
Copyright © 2002-2005 SPI, Inc.
This program is free software; you may redistribute it and/or
modify it under the terms of GPL (see the COPYING file).