XML (eXtensible Markup Language) is a subset of SGML (Standard Generalised Markup Language), which came out of IBM in the 1960s or 1970s. Its purpose is to describe the content of a document in a portable, parseable manner. Different document types can use different markups to describe their content.

SGML requires a DTD (Document Type Definition) in order for a parser to be applied to a marked up document. XML, however, can be parsed for without a DTD if it "well formed". If a DTD is supplied, a "valid" document will pass validation and consistency checks aginst the DTD. The DTD is necessary if the meaning of the tags is required. Note: There are a number of proposals to replace or supplement DTDs with XML-based decriptions, or "schema". See XML schema for more on this.

Neither the XML nor the DTD (or XML schema) express anything about how to present the information contained in the document on an output device (browser, hardcopy, whatever). This is separated out into a style sheet. Current implementations use Cascading Style Sheets (CSS). However, in a like manner to DTDs, CSS is being replaced by an XML-based alternative, XSL (eXtensible Stylesheet Language).

Logically, an XML document is a hierarchical collection of elements. Elements can contain attributes, other elements, or text. Physically, an element is a represented by a collection of entities.

The syntax of an XML element is trivial. There are two forms.

Non-empty element
A non-empty element consists of an opening tag, optional content and a matching, closing tag, being the opening tag prefixed with "/".
For example, <em>For example</em>
Note that the "<" and ">" are familiar from HTML (HyperText Markup Language).
Empty element
An empty element consists of a tag with "/" as its last character.
For example, <disclaimer type="standard"/>

Attributes are supplied in a similar way to HTML, as label=value pairs, as in the "empty" example above.

Content can be text or further XML entities, depending on the DTD. In the non-empty example above, the text "For example" is the content of the "em" element.

A good website to find out more is http://www.ucc.ie/xml/.

A great website, however, is http://www.keller.com/xml/ - a nice short course on how to "do it".

At first glance XML looks quite similar to HTML in that it is made up of text, tags and attributes. Upon closer inspection though, they show themselves to be quite different. While HTML concerns itself with how data should be displayed only, XML allows a sense of what the data means to be incorporated into the document. For example you might markup an address in HTML like this

<TABLE>
      <TR> 
             <TD>484</TD><TD>St Kilda Road</TD>
      </TR>
      <TR>
             <TD>Melbourne</TD>
      </TR>
      <TR>
            <TD>VIC</TD><TD>3000</TD>
      </TR>
</TABLE>


However in an XML document it might look like this

<ADDRESS>
      <NUMBER>484</NUMBER>
      <STREET>St Kilda Road</STREET>
      <CITY>Melbourne</CITY>
      <STATE>VIC</STATE>
      <PCODE>3000</PCODE>
</ADDRESS>


Notice how the XML adds structure and meaning to the data. While in the HTML "St Kilda Road" is just some text in a table in the HTML, the XML specifies that it is a STREET and is part of an ADDRESS.

Of course there are many different structures and meanings that can be applied to the same data. For example if we weren't really interested in the above data as an address but wanted to perform some sort of syntactic analysis on it we might specify it in another piece of XML as

<SENTENCE>
	<NUMERAL> 484 </NUMERAL>
	<NOUN type="proper">
                       <ABBREVIATION>St</ABBREVIATION>
                       <NAME>Kilda</NAME>
        </NOUN>
        <NOUN>Road</NOUN>
        <NOUN type="proper">Melbourne</NOUN>
        <ABBREVIATION>VIC</ABBREVIATION>
        <NUMERAL>3000</NUMERAL>
</SENTENCE>


in which case we don't see it as an ADDRESS but as NUMERALS, NOUNS and ABBREVIATIONS grouped into a SENTENCE.

You can do this sort of thing because XML is eXtensible. Unlike HTML which has a static set of tags, you can create new XML tags to confer whatever meaning and structure you wish to data. In fact if you think about it the HTML fragment first shown is also XML but the tags used are designed to specify the structure for displaying arbitrary text. Actually all HTML documents could be thought of as XML documents if it werent for the fact that XML is a bit stricter. Specifically

1) All XML must be well formed.
HTML is very forgiving when it comes to syntax (which has lead to a lot of very sloppy HTML being produced) but XML isn't. In order to be well formed an XML document must, among other things, have closing tags for all opening tags and present them in the right order. ( For a full description of what constitutes a well formed XML document see http://www.ucc.ie/xml/#FAQ-WF ) The vast majority of HTML documents out there are not well formed, but if they were then they would all also be XML documents.
2) You can specify that XML must also be valid
If you do so then you must provide a Document Type Declaration ( DTD ) for the XML to be validated against. A DTD specifies rules that the tags and elements in the XML document must follow to be considered valid. For example you could specify in a DTD that the contents of a NUMBER tag as used above must consist of one or more numerals followed optionally by a letter. Then a document containing
<NUMBER>27a</NUMBER>
is valid, but one containing
<NUMBER>ABC</NUMBER>
would not be.
Basically a DTD allows you to formally specify a type of XML document and hence the structure and meaning to be conferred to the data.

Log in or register to write something here or to contact authors.