1. Introduction

PLINTH, the Platform for Intelligent Hypertext [1,2] is a domain independent expertext [3] shell which supports the processes of authoring and reading technical documents. Its data model is semantically enriched hypertext, where nodes and links in the hyperdocument can be assigned to user-defined taxonomic types and tagged with named attributes. The types are used to mark the function of each node of text (e.g. introduction, background, and note for almost all documents, requirement and scope in regulatory codes [1], etc.) and to make explicit the structural, logical and rhetorical links between them (e.g. part-of, next, annotates and applies-in). Named attributes can be used e.g. for classifying and indexing nodes, marking the strength of links, including information on the author of a node and its revision number, and so on. PLINTH also includes a rule-based network traversal tool, which provides the reader with automatic, intelligent navigation and consultation of semantically rich hypertexts, and supports the author with schemes for their principled creation and extension.

This paper and the earlier paper The Use of Object-Oriented Databases in PLINTH [4] look at solutions to problems with handling the potentially huge hypertext networks used by PLINTH. These problems fall into two categories.

Firstly, there are limitations on main memory, which restrict the size of data structures that can be handled and may make it difficult or impossible to work with very large documents in their entirety. We propose [ibid.] the use of object-oriented database (OODB) technology, which provides efficient and well-structured direct access to data in fast secondary storage (e.g. hard disk), as a potential solution to this problem.

OODBs do not, however, help with the second problem, which is that even the secondary storage available may not be able to cope with the vast number of documents to which a particular application may require access. For example, large commercial, governmental and academic organisations typically have whole libraries of technical documents necessary to their everyday functioning. Even if this could all be represented on-line, each document may well reference hundreds of other relevant works, which are not in the library but to which on-line access would be very useful. The largest organisations typically have many departments, each with its own library of technical reference material. If a document is needed by another department, then either it has to be physically transported between sites, or multiple copies must be stored - neither way is very efficient.

In fact, there are two obvious answers to the problem of lack of on-line storage: (i) obtaining more storage space, and (ii) storing less data. Procedures for (i) being well- known, we will concern ourselves with (ii). What we really mean by 'storing less data' avoiding local storage of documents that can be accessed electronically, on demand, from an external site. The external site may be another department of the same organisation, or a library on the other side of the world. Ideally, only the publisher of a book, manual or report would ever have to store a copy of it. Those wishing to use the document (and licensed to do so) would be able to transfer it over the network, page by page, as required.

This situation is likely to become more and more the actual case in future, and the basic technology is here already, in the form of the rapidly expanding Internet-based hypermedia network, the World Wide Web. This paper discusses the possibilities for integrating PLINTH with 'the Web'.

The World Wide Web...