4. Using HTML as TOME's native text format

In its current incarnation, HTML does not provide a sufficiently rich text markup scheme for it to be used to represent PLINTH documents. Even its text formatting facilities are very basic, and not really up to describing pages with complex tables and annotated diagrams such as are found in the building standards document [15] to which PLINTH is initially being applied. However, a new version of HTML called HTML+ is currently being developed which will provide tables, subscripts and superscripts, and floating figures.

The greatest problem with HTML as it stands is that the ability to tag nodes and links with semantic types and attributes is completely lacking. A very limited form of link typing with just a few fixed types has been proposed for HTML+, but this is of little use to PLINTH. 'Semantic headers' have also been suggested for even later versions, but details of what these might contain are as yet unavailable.

However, defining a new superset of HTML+ for PLINTH's own use, with all the necessary semantic tagging features, could be a worthwhile development. SGML is certainly powerful enough to support this. For example, the text of a node could be marked up as in the following example:


     <node type=section title="Regulation 26" display=window>
     <node type=requirement title="Regulation 26.1" display=region id=26.1>

          text of Regulation 26.1

     </node>
     <node type=scope title="Regulation 26.2" display=region id=26.2>

          text of Regulation 26.2

     </node>
     <link type=application from-id=26.1 to-id=26.2>
     </node>

which represents a section "Regulation 26" containing a requirement clause and a scope clause (to be displayed as highlighted regions within the section window) which have an application link between them. The advantages of using this expertext markup language (ETML) are: Of course, SGML is a textual data format intended for file-based storage and transmission. While it can be used 'raw' as TOME's internal representation of the marked-up text (replacing the current system in which each 32-bit character cell can be marked individually with its visual properties and start/end markers for nested nodes) and edited invisibly through a WYSIWYG interface, the ETML data would still need to be 'compiled' into an efficient internal framed-based or object-oriented representation (e.g. using the CONNEKT library as now) before being manipulated by other parts of the PLINTH system. For example, navigation rules would still need to view a node as an object with a type and various named attributes, rather than a just string of text from which the information has to be extracted every time it is needed.
Summary and conclusions...