At first glance XML
looks quite similar to HTML
in that it is made up of text
Upon closer inspection though, they show themselves to be quite different
. While HTML concerns itself with how data should be displayed
only, XML allows a sense of what the data means
to be incorporated into the document.
For example you might markup an address
in HTML like this
<TD>484</TD><TD>St Kilda Road</TD>
However in an XML
document it might look like this
<STREET>St Kilda Road</STREET>
Notice how the XML adds structure
to the data. While in the HTML "St Kilda Road
" is just some text in a table in the HTML, the XML specifies that it is a STREET and is part of an ADDRESS.
Of course there are many different structures and meanings that can be applied to the same data
. For example if we weren't really interested in the above data as an address but wanted to perform some sort of syntactic analysis
on it we might specify it in another piece of XML as
<NUMERAL> 484 </NUMERAL>
in which case we don't see it as an ADDRESS but as NUMERALS, NOUNS and ABBREVIATIONS grouped into a SENTENCE.
You can do this sort of thing because XML is eXtensible
. Unlike HTML which has a static
set of tags, you can create new XML tags to confer
whatever meaning and structure
you wish to data. In fact if you think about it
the HTML fragment first shown is also XML but the tags used are designed to specify the structure for displaying arbitrary
Actually all HTML documents could be thought of as XML documents if it werent for the fact that XML is a bit stricter
1) All XML must be well formed
HTML is very forgiving when it comes to syntax (which has lead to a lot of very sloppy HTML being produced) but XML isn't. In order to be well formed an XML document must, among other things, have closing tags for all opening tags and present them in the right order.
( For a full description of what constitutes a well formed XML document see http://www.ucc.ie/xml/#FAQ-WF )
The vast majority of HTML documents out there are not well formed, but if they were then they would all also be XML documents.
2) You can specify that XML must also be valid
If you do so then you must provide a Document Type Declaration ( DTD ) for the XML to be validated against. A DTD specifies rules that the tags and elements in the XML document must follow to be considered valid. For example you could specify in a DTD that the contents of a NUMBER tag as used above must consist of one or more numerals followed optionally by a letter.
Then a document containing
is valid, but one containing
would not be.
Basically a DTD allows you to formally specify a type of XML document and hence the structure and meaning to be conferred to the data.