XML (eXtensible Markup Language
) is a subset
(Standard Generalised Markup Language
), which came out of IBM
in the 1960s or 1970s. Its purpose
is to describe
of a document in a portable, parseable manner. Different document type
s can use different markups to describe their content.
SGML requires a DTD (Document Type Definition) in order for a parser to be applied to a marked up document. XML, however, can be parsed for without a DTD if it "well formed". If a DTD is supplied, a "valid" document will pass validation and consistency checks aginst the DTD. The DTD is necessary if the meaning of the tags is required. Note: There are a number of proposals to replace or supplement DTDs with XML-based decriptions, or "schema". See XML schema for more on this.
Neither the XML nor the DTD (or XML schema) express anything about how to present the information contained in the document on an output device (browser, hardcopy, whatever). This is separated out into a style sheet. Current implementations use Cascading Style Sheets (CSS). However, in a like manner to DTDs, CSS is being replaced by an XML-based alternative, XSL (eXtensible Stylesheet Language).
Logically, an XML document is a hierarchical collection of elements. Elements can contain attributes, other elements, or text. Physically, an element is a represented by a collection of entities.
The syntax of an XML element is trivial. There are two forms.
- Non-empty element
- A non-empty element consists of an opening tag, optional content and a matching, closing tag, being the opening tag prefixed with "/".
- For example, <em>For example</em>
- Note that the "<" and ">" are familiar from HTML (HyperText Markup Language).
- Empty element
- An empty element consists of a tag with "/" as its last character.
- For example, <disclaimer type="standard"/>
Attributes are supplied in a similar way to HTML, as label=value pairs, as in the "empty" example above.
Content can be text or further XML entities, depending on the DTD. In the non-empty example above, the text "For example" is the content of the "em" element.
A good website to find out more is http://www.ucc.ie/xml/.
A great website, however, is http://www.keller.com/xml/ - a nice short course on how to "do it".