HTML is defined in terms of the ISO Standard Generalised Markup Language (SGML [8,9], which is a language for describing structured document markup schemes. Every SGML document starts with a header consisting of a declaration and prologue, which describe both what constitutes an element in the ASCII data that follows, and how those elements may legally be combined (i.e. the 'morphology' and 'syntax' of the data and markup tags, to use a linguistic metaphor). An SGML parser program will check that the following text and markup tag sequence (the document instance) is well-formed with respect to the initial header. However, since all HTML documents share the same SGML header (by definition) this does not need to be included in every HTML resource on the WWW, or transmitted as part of every http transaction; instead it can be built into the client browsing software.
It is important to note that HTML markup is not intended to describe the exact visual appearance of a hypertext document (e.g. specific fonts, point sizes, paragraph layouts and margin widths). Rather it describes the logical structure of the document (e.g. emphasised text, headings, bulleted lists), and it is up to HTML browsing software like NCSA Mosaic to decide how best to present this to the user.
In any case, the most important feature of HTML is not its ability to describe the internal structure of a document, but its facility to embed cross-references to other WWW resources. This is done by means of link anchor tags, which specify the target resource by means of a universal resource locator, or URL. Full details of URL syntax are available in [10], but basically a URL has the form:
protocol://location/file#destinationwhere protocol is http, ftp, or gopher (see 2.3 below), location is the Internet machine address (e.g. www.aiai.ed.ac.uk), file is the pathname of the resource on the remote machine (e.g. /~andrewc/home.html) and the optional destination is the name of a source anchor tag marking a point within the target file.
Again, it is up to the WWW browsing software to decide what to do with this information. NCSA Mosaic, for example, highlights the word or phrase with which the link anchor is associated, and responds to a mouse click on the highlight by loading the resource and displaying it, whereupon the process may be repeated for links from the new page.
In addition, there is a huge amount of information on the Internet that is available not through http servers, but through other data transfer protocols and programs such as ftp and gopher, which have been around much longer than the WWW. Therefore ftp and gopher access have been integrated into the WWW data model, as we saw in the description of URLs above.