4. Using HTML as TOME's native text format
In its current incarnation, HTML does not provide a sufficiently rich text
markup scheme for it to be used to represent PLINTH documents. Even its text
formatting facilities are very basic, and not really up to describing pages
with complex tables and annotated diagrams such as are found in the building
standards document [15] to which PLINTH is initially being applied. However,
a new version of HTML called HTML+ is currently being developed which will
provide tables, subscripts and superscripts, and floating figures.
The greatest problem with HTML as it stands is that the ability to tag nodes
and links with semantic types and attributes is completely lacking. A very
limited form of link typing with just a few fixed types has been proposed for
HTML+, but this is of little use to PLINTH. 'Semantic headers' have also been
suggested for even later versions, but details of what these might contain are
as yet unavailable.
However, defining a new superset of HTML+ for PLINTH's own use, with
all the necessary semantic tagging features, could be a worthwhile
development. SGML is certainly powerful enough to support this. For example,
the text of a node could be marked up as in the following example:
<node type=section title="Regulation 26" display=window>
<node type=requirement title="Regulation 26.1" display=region id=26.1>
text of Regulation 26.1
</node>
<node type=scope title="Regulation 26.2" display=region id=26.2>
text of Regulation 26.2
</node>
<link type=application from-id=26.1 to-id=26.2>
</node>
which represents a section "Regulation 26" containing a
requirement clause and a scope clause (to be displayed as
highlighted regions within the section window) which have an
application link between them. The advantages of using this expertext
markup language (ETML) are: - Since ETML is a superset of HTML(+) then
PLINTH can read and format the latter automatically.
- If HTML(+) is extended to allow user-defined tags that can be
ignored by browsers that do not recognise them, as has also been proposed,
then Mosaic and suchlike will be able to format PLINTH data without
being upset by the semantic tags.
- SGML markup can be used for document analysis as well as display. Generic
SGML tools for doing this will be able to work with PLINTH data.
Of course, SGML is a textual data format intended for file-based storage and
transmission. While it can be used 'raw' as TOME's internal representation of
the marked-up text (replacing the current system in which each 32-bit
character cell can be marked individually with its visual properties and
start/end markers for nested nodes) and edited invisibly through a WYSIWYG
interface, the ETML data would still need to be 'compiled' into an efficient
internal framed-based or object-oriented representation (e.g. using the
CONNEKT library as now) before being manipulated by other parts of the PLINTH
system. For example, navigation rules would still need to view a node as an
object with a type and various named attributes, rather than a just string of
text from which the information has to be extracted every time it is needed.
Summary and conclusions...