The purpose of a DTD (Document Type Definition) is to provide a parser with the necessary rules to confirm a particular document is valid.

SGML has a rich and complex DTD syntax. XML has a much simpler DTD syntax. However, this is likely to be replaced by XML schema, which use the same syntax as XML itself. There are a number of syntax elements in the DTD, as follows.


!DOCTYPE

An XML DTD is referenced from an XML document using a "!DOCTYPE" tag. This takes one of two forms:
  1. Internal
    <!DOCTYPE dtd-name [ ...declarations... ]>
  2. External
    <!DOCTYPE dtd-name SYSTEM "filename">
Notice that this doesn't follow XML syntax rules.

!ELEMENT

Each element used in the XML grammar described by this DTD is defined in an "!ELEMENT" tag. This has the following format:
<!ELEMENT element-name (content-model)>

The content-model describes the content of this element. The syntax elements are:

Grouping
A content-model definition enclosed in brackets "()" can be treated as syntactically-equivalent to an element.
Ordering
A sequence of elements separated by commas must appear in the indicated order.
Alternatives
If a number of elements are separated by "|", any one (but only one) of them may appear.
Occurance
If an element is suffixed by "*", it may occur any number (zero or more) times.
If an element is suffixed by "+", it must occur once or more than once.
If an element is suffixed by "?", it must occur either zero times or once.

A number of special, predefined content-models exist that have special meanings:

EMPTY
This indicates that there must be no content for this element.
ANY
This indicates that any valid element may form the content for this element.
#PCDATA
This indicates that text may form the content of this element.

!ATTRIBUTE

XML entities can have attributes. These must be defined in the DTD using the "!ATTRIBUTE" tag. This has the following format:
<!ATTRIBUTE element-name attribute-name attribute-type>
or
<!ATTRIBUTE element-name attribute-name attribute-type keyword>
or
<!ATTRIBUTE element-name attribute-name attribute-type default>

Multiple attributes may be specified by repeating the the attribute-name... syntax as many times as is required.

The following values are valid for attribute-type:

CDATA
The value can be any character data.
ID
The value must be a unique identifier.
IDREF
The value must be an existing identifier - i.e. this is a reference to an entity with the matching value in an ID attribute.
NMTOKEN
The value may contain only letters, digits and hyphens - i.e. valid characters for constructing names or tokens.
ENTITY
The value must be a valid entity.
enumerated values
A bracketed, |-separated list of valid values.
The value may be any of the listed of values, which are separated by "|".

The following values are valid for the optional keyword:

#REQUIRED
This attribute must be specified on this entity.
#IMPLIED
This attribute may, optionally, be specified on this entity. If omitted, the reader will supply their own value.
#FIXED value
This attribute of this entity is always the value stated.

Finally, for an optional default value may be supplied. This is mutually exclusive with keyword.


!ENTITY

The XML DTD syntax also allows for "entities" to be defined. Essentially, these represent textual substitutions at one or other level. They exist in two forms: those that are substituted in the document (such as &amp; in HTML) and those that can be referenced elsewhere in the DTD itself. They are defined using the "!ENTITY" tag:
<!ENTITY entity-name entity-def>
where:
entity-name
is the name that will be expanded (e.g. "amp"). For use in the DTD, the name is preceeded by a "%" and a space.
entity-def
is the value that will replace the name. This can either be supplied directly or, if preceeded by the keyword "SYSTEM", by reference to a URL.

I was going to supply a DTD describing the XML DTD grammar. However, I don't believe this is possible given what I've described above. Instead, here's a DTD for holloway's customer record file:
<!ELEMENT customer-file customer-details*>
<!ELEMENT customer-details name, address>
<!ELEMENT name #PCDATA>
<!ELEMENT address street, city, state, postal?>
<!ELEMENT street #PCDATA>
<!ELEMENT city #PCDATA>
<!ELEMENT state #PCDATA>
<!ELEMENT postal #PCDATA>
<!ATTRIBUTE customer-details id ID #REQUIRED>
<!ATTRIBUTE address country CDATA "US">
I've decided:

Of course, other DTDs could be written against which the example would be valid.


A tutorial lives here: http://zvon.org/xxl/DTDTutorial/General/book.html - there's also some references. The W3C definitions can be found here: http://www.w3.org/TR/REC-xml#sec-logical-struct.

For years surrounding the release Microsoft's Office 2000 the company was applauded in reviews that said Microsoft had changed their spots and were now supporting an open format... XML!

(After all, it was the story MS had spun and reviewers are inherently lazy creatures)

XML is a method for putting structured data into a file. Within this you choose a DTD (Document Type Definition) that defines the rules for holding the specific data you wish to store. The DTD is the unique subset of XML.

If a DTD isn't published it's no more open than a binary file. Although simple examples are quickly disected and analysed - a programmer has great difficulty knowing that when you bold some text it should be written into the file as a <important> rather than a <bold>. If the rules for saving XML structured information are not published and defined, the XML DTD is still a closed standard... despite being XML.

If I were a paranoid man who slept with the door locked, moat full to the brim, then I might claim that MS noticed the XML buzzword hype and wanted to cash in on the goodwill associated with the "open" meta-language. Get in first and spread the unpublished Word2000 DTD as THE STANDARDTM for text documents.

Log in or register to write something here or to contact authors.