rxp [ -abemnNRsStvVx4 ] [ -o b|p|0|1|2|3|i|d
] [ U 0|1|2 ] [ -c encoding ] [ url ]
rxp reads and parses XML from the url (or standard
input if none is provided) and writes it to standard output, optionally
expanding entities, defaulting attributes, and translating to a different
output encoding.
rxp accepts XML 1.0 and 1.1, and the corresponding versions
of XML namespaces. It implements the Oasis XML catalog specification.
Common option combinations are -Nxs to check a document for
well-formedness and namespace well-formedness, and -VNxs to also
check for DTD-validity.
- -a
- Insert declared default values for omitted attributes.
- -v
- Be verbose.
- -V
- Validate the document. Repeating this option will make the program treat
validity errors as well-formedness errors, and exit after the first
validity error (otherwise a warning will be printed for each one).
- -d
- Read the whole DTD (internal and external parts) regardless of any
standalone declaration. Otherwise a declaration
"standalone='yes'" will prevent the external part from being
read (unless validation is selected).
- -N
- Enable XML namespace support. The document will be checked for correct
namespace syntax, and if -b is specified qualified element and
attribute names will be displayed with their URIs.
- -R
- The value of this flag is a time limit in seconds, after which the program
will abort. This is to protect against denial-of-service attacks using
malicious documents.
- -S
- Keep track of xml:space attributes. This will only affect output when
-b is specified.
- -e
- Obsolete, do not use.
- -E
- Do not expand entity references (opposite of old -e flag)
- -s
- Be silent (that is, suppress output). Useful for benchmarking or if you
just want to see the error messages.
- -b
- Print output as "bits".
- -n
- Treat the input as normalised SGML rather than XML. Not intended for
general use.
- -o
- If this flag is p, output is in the default (plain) format. If it
is b, output is printed as "bits" (equivalent to
-b). If it is 0, output is suppressed (equivalent to
-s). If it is 1, 2 or 3, output is in first,
second or third canonical form. If it is i, output is a dump of the
document's infoset. If it is d, output is in a form suitable for
use with "diff"; in particular attributes are sorted into
alphabetical order.
- -m
- Merge PCData across entity references. This will only affect the output
when -b is specified.
- -t
- Read in the input as a tree, rather than bits. Should make no difference
to the output.
- -u base_uri
- Use the specified base URI when resolving system identifiers.
- -U
- This flag controls Unicode normalization checking and is only relevant
when parsing XML 1.1 documents. If it is 0, no checking is done. If
it is 1, rxp checks that the document is fully normalized as
defined by the W3C character model. If it is 2, the document is
checked and any unknown characters (which may be ones corresponding to a
newer version of Unicode than rxp knows about) will also cause an
error.
- -x
- Strict XML mode. This suppresses some warnings (eg entity redefinitions)
but treats all XML well-formedness errors as fatal. This flag implies the
-a flag, and sets the output encoding to UTF-8 unless the -c
flag is given. It sets the output format to first canonical form unless
the -o, -b or -s flag is given.
- -c encoding
- Produce output in the specified character encoding. Known encodings
include ISO-8859-1, UTF-8, ISO-10646-UCS and
UTF-16. 16-bit encoding names my be suffixed with -B or
-L to specify big- or little-endian byte order (the default is the
host byte order). If no -c or -x option is given, output is
in the same encoding as the input document.
- -D name
sysid
- Force use of the document type specified by sysid. The root element
name for validation is name. Any DTD in the document is ignored.
This flag does not imply validation; use -V if required.
- -i
- Do xml:id processing. Attributes named xml:id are recognised as IDs even
if not declared.
- -I
- The same as -i, but in addition xml:id attributes are checked for
uniqueness.
- -z
- Use a shorter format for error messages. Particularly useful when using
the parser in Emacs compilation mode, so that Emacs can find the error
location.
- -4
- Use pre-fifth-edition rules for XML 1.0. XML 1.0 fifth edition extends the
set of allowed name characters to match XML 1.1, and allows unrecognised
version numbers of the form 1.x to be treated as 1.0. the -4 flag
disables these changes.
If the -V flag is given, and the document is well-formed
but not valid, 2 is returned. If the document is not well-formed, or a
system error occurs, 1 is returned. Otherwise 0 is returned. Since the
parser can expand external entities even when not validating, it treats
certain errors which are technically validity errors as well-formedness
errors. If -x is not specified, some well-formedness errors produce
only warnings and do not affect the exit status.
If the environment variable XML_CATALOG_FILES is set, XML
catalog processing is enabled. A catalog can be used to map system and
public identifiers to local files. In particular, this allows copies of
common DTDs to be kept locally, so that rxp does not have to fetch
them over the internet. XML_CATALOG_FILES should be set to a
space-separated list of catalog files. The variable
XML_CATALOG_PREFER may be set to public or system to
set the initial mode for catalog processing; the default is
system.
If the variable RXPURL is set, it is used as the URL of the
document to parse. This may be useful in CGI scripts and the like to avoid
shell parsing of a user-supplied argument.
The variable http_proxy can be used to specify a proxy for
HTTP connections. The syntax is hostname[:port].