This is an extract from the document written in 1992 by Mark Andreesen. My comments are in italics. Copyright information at the end.
HTML
The WWW system uses marked-up text to represent a
hypertext document for transmision over the network. The
hypertext mark-up language is an SGML format. WWW
parsers should ignore tags which they do not understand,
and ignore attributes which they do not understand of tags
which they do understand.
From the very beginning the SGML-nature of HTML is clear. Notice also the typical RFC design principle of tolerance for robustness and compatibility.
Default text
Unless otherwise defined by elements and entities, HTML
text is interpreted as follows.
The text consists of a stream of lines. The division of the
stream of characters into lines is arbitrary, and only made in order to allow the text to be passed through systems which can only handle text with a limited line length. The
recommended line length for transmission is 80 characters,
but htis (sic) is a recommendation only.
The division into lines has no significance (except in the
case of example sections and PLAINTEXT ) apart from
indicating a word end. Line breaks between tags have no
significance.
From the very beginning of HTML, whitespace is all collapsed to word breaks. This principle holds true in all the following HTML specs. The exception is the PLAINTEXT tag and XMP, which later turned into the PRE tag (not entirely)..
The tags
Currently HTML documents are transmitted without the
normal SGML framing tags, but if these are included parsers
will ignore them.
Title
The title of a document is given between title tags:
<TITLE> ... </TITLE>
The text between the opening and the closing tags is a title for the hypertext node. There should only be one title in any node. It should identify the content of the node in a fairly wide context, and should ideally fit on one line.
These principles still hold true currently, since the TITLE is typically used to identify bookmarks in a browser.
The title is not strictly part of the text of the document, but is an attribute of the node. It may not contain anchors,
paragraph marks, or highlighting. the title may be used to
identify the node in a history list, to label the window
displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings.
This somewhat fuzzy concept eventually resolved in the current definition of TITLE as an element inside the HEAD element of an HTML document.
Omitted stuff on the NextID empty tag, a screwy thing generated by a browser for the NeXT system
Base Address
Anchors specify addresses of other documents, in a from
relative to the address of the current document. Normally,
the address of a document is known to the browser
because it was used to access the document. However, is
a document is mailed, or is somehow visible with more than
one address (for example, via its filename and also via its
library name server catalogue number), then the browser
needs to know the base address in order to correctly
deduce external document addresses.
The format of this tag is not yet specified. NOT CURRENTLY
USED
This eventually resolved in the rarely used BASE element. Document portability is still a problem.
Notice that in this vision a document may have multiple access methods and multiple URLs; a very dynamic concept.
Anchors
The format of an anchor is as follows:
<A NAME=xxx HREF=XXX> ... </A>
The text between the opening tag and the closing tag is
either the start or destination (or both) of a link. Attributes of the anchor tag are as follows.
- HREF
- If the HREF attribute is present, the anchor is
senstive text: the start of a link. If the reader selects this text, he should be presented with another
document whose network address is defined by the
value of the HREF attribute . The format of the network
address is specified elsewhere . This allows for the
form HREF=#identifier to refer to another anchor in the
same document. If the anchor is in another document,
the atribute is a relative name , relative to the
documents address (or specified base address if
any).
The elsewhere refers to a primitive description of the URL - notice that at this point in time, URLs were not called URL.
- NAME
- The attribute NAME allows the anchor to be the
destination of a link. The value of the parameter is that
part of a hypertext address which follows the hash
sign .
Notice that in XHTML 1.1, the "NAME" element of the A tag is deprecated. ID should be used instead.
- TYPE
- An attribute TYPE may give the relationship
described by the hyertext link. The type is expressed
by a string for extensibility. Strings for types with
particular semantics will be registered by the W3
team. The default relationship if none other is given is void.
This attribute was (to my knowledge) never used, although the concept of typed links in hypertext systems was a fairly standard one in hypertext research.
All attributes are optional, although one of NAME and HREF
is necessary for the anchor to be useful.
IsIndex
This tag informs the reader that the document is an index
document. As well as reading it, the reader may use a
keyword search.
The node may be queried with a keyword search by suffixing
the node address with a question mark, followed by a list of
keywords separated by plus signs. See the network
address format .
This document predates the definition of the CGI standard. Searching was clearly already an issue, but the definition given is fuzzy.
In the network address format page, you would find this example http://cernvm/FIND/?sgml+cms, which clearly looks CGI-ish.
Plaintext
This tag indicates that all following text is to be taken
litterally, up to the end of the file. Plain text is designed to be represented in the same way as example XMP text, with fixed width character and significant line breaks. Format:
<PLAINTEXT>
This tag allows the rest of a file to be read efficiently without parsing. Its presence is an optimisation. There is no closing
tag.
You will notice the attention given to optimizing HTML rendering. This tag is defunct, probably because there is no way to say that in SGML. Plain text is currently marked with the PRE tag.
Example sections
This section (omitted) introduces the LISTING and XMP tags, whose key difference is that the LISTING tag is portrayed so that at least 132 characters will fit on a line. The XMP tag is portrayed in a font so that at least 80 characters will fit on a line but is otherwise identical to LISTING.
The definition is, in a sense, impure because it marks at the tag level a typographical difference. A cleaner way would have been to use something like a SIZE attribute.
Paragraph
This tag indicates a new paragraph. The exact
representation of this (indentation, leading, etc) is not
defined here, and may be a function of other tags, style
sheets etc. The format is simply
<P>
(In SGML terms, paragraph elements are transmitted in
minimised form).
Beginning with HTML 4.0 the paragraph tag needs to be closed. Notice the presence of the style sheet concept at this very early stage of the WWW.
Headings
Several levels (at least six) of heading are supported. Note that a hypertext document tends to need less levels of
heading than a normal document whose only structure is
given by the nesting of headings. H1 is the highest level of
heading, and is recommened for the start of a hypertext
node. It is suggested that the first heading be one suitable
for a reader who is already browsing in related information,
in contrast to the title tag which should identify the node in a wider context
These tags are kept as defined in the CERN SGML guide.
Their definition is completely historical, deriving from the
AAP tag set.
The levels remained six. The advice on how to write good headings remain valid today.
Omitted: definition of the goofy ADDRESS tag and of some bizarre HP1 HP2 HP3 ... tags for highlighting.
The highlighting tags were never implemented, although one could claim that the evil BLINK abomination was not so far off.
Glossaries
A glosary (or definition list) is a list of paragraphs each of which has a short title alongside it. Apart from glossaries, this format is useful for presenting a set of named elements to the reader.
<DL>
<DT>foo<DD>definition of foo
<DT>bar<DD>definition of bar
</DL>
The DL tag (with DT and DD) appears early, already plagued by unclear use suggestions.
Lists
A list is a sequence of paragraphs, each of which is
preceded by a special mark or sequence number.
The opening list tag must be immediately followed by the first list element. The representation of the list is not defined here, but a bulleted list for unordered lists, and a sequence of numbered paragraphs for an ordered list would be quite appropriate. Other possibilities for interactive display include embedded scrollable browse panels.
Opening list tags are:
- UL
- A list multi-line paragraphs, typically separated by some white space.
- MENU
- A list of smaller paragraphs. Typically one line per item, with a style more compact than UL.
- DIR
- A list of short elements, less than one line. Typical
style is to arrange in four columns or provide a
browser, etc.
the closing tag must obviously match the opening tag.
Notice here two currently obsolete list types, MENU and DIR, and the absence of OL.
Quoted from http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/MarkUp.html - Copyright © 1995-2001 W3C® (MIT, INRIA, Keio), All Rights Reserved.
This material has been copied in compliance with the copyright owner Document Notice and License, available at http://www.w3.org/Consortium/Legal/copyright-documents-19990405
Thanks to Gritchka and generic-man for pointing out bugs.