SGMLS - class for postprocessing the output from the onsgmls,
sgmls, and nsgmls parsers.
use SGMLS;
my $parse = new SGMLS(STDIN);
my $event = $parse->next_event;
while ($event) {
SWITCH: {
($event->type eq 'start_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the beginning of an element]]
last SWITCH;
};
($event->type eq 'end_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the end of an element]]
last SWITCH;
};
($event->type eq 'cdata') && do {
my $cdata = $event->data; # A string
[[your code for character data]]
last SWITCH;
};
($event->type eq 'sdata') && do {
my $sdata = $event->data; # A string
[[your code for system data]]
last SWITCH;
};
($event->type eq 're') && do {
[[your code for a record end]]
last SWITCH;
};
($event->type eq 'pi') && do {
my $pi = $event->data; # A string
[[your code for a processing instruction]]
last SWITCH;
};
($event->type eq 'entity') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for an external entity]]
last SWITCH;
};
($event->type eq 'start_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the beginning of a subdoc entity]]
last SWITCH;
};
($event->type eq 'end_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the end of a subdoc entity]]
last SWITCH;
};
($event->type eq 'conforming') && do {
[[your code for a conforming document]]
last SWITCH;
};
die "Internal error: unknown event type " . $event->type . "\n";
}
$event = $parse->next_event;
}
The SGMLS package consists of several related classes: see
"SGMLS", "SGMLS_Event", "SGMLS_Element",
"SGMLS_Attribute", "SGMLS_Notation", and
"SGMLS_Entity". All of these classes are available when you
specify
use SGMLS;
Generally, the only object which you will create explicitly will
belong to the "SGMLS" class; all of the
others will then be created automatically for you over the course of the
parse. Much fuller documentation is available in the
".sgml" files in the
"DOC/" directory of the
"SGMLS.pm" distribution.
This class holds a single parse. When you create an instance of
it, you specify a file handle as an argument (if you are reading the output
of onsgmls, sgmls or nsgmls from a pipe, the file
handle will ordinarily be "STDIN"):
my $parse = new SGMLS(STDIN);
The most important method for this class is
"next_event", which reads and returns the
next major event from the input stream. It is important to note that the
"SGMLS" class deals with most ESIS
events itself: attributes and entity definitions, for example, are collected
and stored automatically and invisibly to the user. The following list
contains all of the methods for the
"SGMLS" class:
- "next_event()": Return an "SGMLS_Event" object
containing the next major event from the SGML parse.
- "element()": Return an "SGMLS_Element" object
containing the current element in the document.
- "file()": Return a string containing the name of the current
SGML source file (this will work only if the "-l" option was given
to onsgmls, sgmls or nsgmls).
- "line()": Return a string containing the current line number
from the source file (this will work only if the "-l" option was
given to onsgmls, sgmls or nsgmls).
- "appinfo()": Return a string containing the "APPINFO"
parameter (if any) from the SGML declaration.
- "notation(NNAME)": Return an "SGMLS_Notation" object
representing the notation named "NNAME". With newer versions of
nsgmls, all notations are available; otherwise, only the notations
which are actually used will be available.
- "entity(ENAME)": Return an "SGMLS_Entity" object
representing the entity named "ENAME". With newer versions of
nsgmls, all entities are available; otherwise, only external data
entities and internal entities used as attribute values will be
available.
- "ext()": Return a reference to an associative array for
user-defined extensions.
This class holds a single major event, as generated by the
"next_event" method in the
"SGMLS" class. It uses the following
methods:
- "type()": Return a string describing the type of event:
"start_element", "end_element", "cdata",
"sdata", "re", "pi", "entity",
"start_subdoc", "end_subdoc", and
"conforming". See "SYNOPSIS", above, for the values
associated with each of these.
- "data()": Return the data associated with the current event (if
any). For "start_element" and "end_element", returns an
"SGMLS_ELement" object; for "entity",
"start_subdoc", and "end_subdoc", returns an
"SGMLS_Entity" object; for "cdata", "sdata",
and "pi", returns a string; and for "re" and
"conforming", returns the empty string. See "SYNOPSIS",
above, for an example of this method's use.
- "key()": Return a string key to the event, such as an element or
entity name (otherwise, the same as "data()").
- "file()": Return the current file name, as in the
"SGMLS" class.
- "line()": Return the current line number, as in the
"SGMLS" class.
- "element()": Return the current element, as in the
"SGMLS" class.
- "parse()": Return the "SGMLS" object which generated
the event.
- "entity(ENAME)": Look up an entity, as in the "SGMLS"
class.
- "notation(ENAME)": Look up a notation, as in the
"SGMLS" class.
- "ext()": Return a reference to an associative array for
user-defined extensions.
This class is used for elements, and contains all associated
information (such as the element's attributes). It recognises the following
methods:
- "name()": Return a string containing the name, or Generic
Identifier, of the element, in upper case.
- "parent()": Return the "SGMLS_Element" object for the
element's parent (if any).
- "parse()": Return the "SGMLS" object for the current
parse.
- "attributes()": Return a reference to an associative array of
attribute names and "SGMLS_Attribute" structures. Attribute names
will be all in upper case.
- "attribute_names()": Return an array of strings containing the
names of all attributes defined for the current element, in upper case.
- "attribute(ANAME)": Return the "SGMLS_Attribute"
structure for the attribute "ANAME".
- "set_attribute(ATTRIB)": Add the "SGMLS_Attribute"
object "ATTRIB" to the current element, replacing any other
attribute structure with the same name.
- "in(GI)": Return "true" (ie. 1) if the string
"GI" is the name of the current element's parent, or
"false" (ie. 0) if it is not.
- "within(GI)": Return "true" (ie. 1) if the string
"GI" is the name of any of the ancestors of the current element,
or "false" (ie. 0) if it is not.
- "ext()": Return a reference to an associative array for
user-defined extensions.
Each instance of an attribute for each
"SGMLS_Element" is an object belonging to
this class, which recognises the following methods:
- "name()": Return a string containing the name of the current
attribute, all in upper case.
- "type()": Return a string containing the type of the current
attribute, all in upper case. Available types are "IMPLIED",
"CDATA", "NOTATION", "ENTITY", and
"TOKEN".
- "value()": Return the value of the current attribute, if any.
This will be an empty string if the type is "IMPLIED", a string of
some sort if the type is "CDATA" or "TOKEN" (if it is
"TOKEN", you may want to split the string into a series of
separate tokens), an "SGMLS_Notation" object if the type is
"NOTATION", or an "SGMLS_Entity" object if the type is
"ENTITY". Note that if the value is "CDATA", it will
not have escape sequences for 8-bit characters, record ends, or SDATA
processed -- that will be your responsibility.
- "is_implied()": Return "true" (ie. 1) if the value of
the attribute is implied, or "false" (ie. 0) if it is specified in
the document.
- "set_type(TYPE)": Change the type of the attribute to the string
"TYPE" (which should be all in upper case). Available types are
"IMPLIED", "CDATA", "NOTATION",
"ENTITY", and "TOKEN".
- "set_value(VALUE)": Change the value of the attribute to
"VALUE", which may be a string, an "SGMLS_Entity"
object, or an "SGMLS_Notation" subject, depending on the
attribute's type.
- "ext()": Return a reference to an associative array available
for user-defined extensions.
All declared notations appear as objects belonging to this class,
which recognises the following methods:
- "name()": Return a string containing the name of the
notation.
- "sysid()": Return a string containing the system identifier of
the notation, if any.
- "pubid()": Return a string containing the public identifier of
the notation, if any.
- "ext()": Return a reference to an associative array available
for user-defined extensions.
All declared entities appear as objects belonging to this class,
which recognises the following methods:
- "name()": Return a string containing the name of the entity, in
mixed case.
- "type()": Return a string containing the type of the entity, in
upper case. Available types are "CDATA", "SDATA",
"NDATA" (external entities only), "SUBDOC",
"PI" (newer versions of nsgmls only), or "TEXT"
(newer versions of nsgmls only).
- "value()": Return a string containing the value of the entity,
if it is internal.
- "sysid()": Return a string containing the system identifier of
the entity (if any), if it is external.
- "pubid()": Return a string containing the public identifier of
the entity (if any), if it is external.
- "filenames()": Return an array of strings containing any file
names generated from the identifiers, if the entity is external.
- "notation()": Return the "SGMLS_Notation" object
associated with the entity, if it is external.
- "data_attributes()": Return a reference to an associative array
of data attribute names (in upper case) and the associated
"SGMLS_Attribute" objects for the current entity.
- "data_attribute_names()": Return an array of data attribute
names (in upper case) for the current entity.
- "data_attribute(ANAME)": Return the "SGMLS_Attribute"
object for the data attribute named "ANAME" for the current
entity.
- "set_data_attribute(ATTRIB)": Add the
"SGMLS_Attribute" object "ATTRIB" to the current entity,
replacing any other data attribute with the same name.
- "ext()": Return a reference to an associative array for
user-defined extensions.
Copyright 1994 and 1995 by David Megginson,
"dmeggins@aix1.uottawa.ca". Distributed
under the terms of the Gnu General Public License (version 2, 1991) -- see
the file "COPYING" which is included in
the SGMLS.pm distribution.
SGMLS::Output and SGMLS::Refs.