Friday, December 5, 2008

Extensible Markup Language (XML)

XML provides a universal, standardised and well-supported mechanism for marking up data, for use on the web and in other applications. Unlike HTML, which is a language based around displaying data in a web browser, XML puts no constraints on the purpose for which the data will be used, but merely describes the structure of the data. XML can therefore be used (and is used) for applications that involve the transfer of data across the Internet either for display or computational purposes.

It is important to understand that XML does not, on its own, constitute a presentation markup language - it is a markup metalanguage. That is, a syntax within which we can define other languages. The Wireless Markup Language used by WAP phones and XHTML, which provides HTML functionality in XML syntax and is intended to supersede HTML, are two aimed at displaying information. The Extensible Business Reporting Language (XBRL) is an example of a language developed for transferring data for processing by computer. A major strength of XML is that the same data can be used directly by a computer and displayed for human users.

XML has many supporting standards. The Extensible Stylesheet Language (XSL) is used to display XML directly in a web browser or other client software. XML Schema is used to define XML languages for specific uses. XLink and XPointer provide powerful linking facilities between and within XML documents and the Document Object Model (DOM) provides a standard programming interface.

XML was developed and is maintained by the World Wide Web Consortium (W3C), which also maintains the HTML standards. It is widely supported by industry and has good commercial tool support.

Within the UK Government, the e-GIF mandates the use of XML for both display and data transfer applications. Use of XML is supported by the UK GovTalkTM initiative and the web site at http://www.govtalk.gov.uk.

Summary

XML has three major uses within the UK public sector.

1. XML languages are being used for transferring information between citizens and government, between businesses and government and across government. Examples are the submission of personal and business tax returns.
2. XML is increasingly being used in web site development. For example, where information is drawn from a database, using XML as an intermediate format between the database and the display code speeds development and eases maintenance.
3. XML can be used for archiving data. Rather than storing archive data in proprietary database formats that could make reading old archives hard in the future, careful design of an XML language allows data to be self-describing. Coupling this with the text-based nature of XML ensures that data can be retrieved and understood easily in the future.

Web sites can use XML both to provide and consume services. For example, both public and private sector web sites can provide a change of residence service that submits an XML message to multiple public sector organisations that store the information in their databases. While holding a dialogue with the citizen, this service might itself send XML messages to the National Land and Property Gazetteer to ensure that the addresses provided are both unambiguous and valid.

What is XML?

The concept of XML is very different to that of HTML. HTML is an application of Standard Generalised Mark up Language (SGML), that is to say the various revisions of the HTML mark-up language are each defined using SGML as a SGML ‘document type’ - XML is a simplified, and more powerful version of SGML itself.

The following quote is from the W3C XML WG (Working Group):

“XML is primarily intended to meet the requirements of large-scale web content providers for industry-specific mark up, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of web documents by intelligent clients. It is also expected to find use in certain metadata applications. XML is fully internationalised for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encodings. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format.

The key to how it works is descriptive mark-up. This allows you to tag data not by its structure in the document or how it will be displayed, but by what kind of data it is. For example, this means it can break an address down into individual descriptive elements such as street, street number, town, and postcode. This is very useful if the address has to be used by different databases each of which records addresses in different ways.”

In order for different databases or systems to use this markup, a XML schema has to be developed to establish what the descriptive mark-up will be for different purposes.

For example, as part of developing a cross-departmental schema that included a representation of people’s names, the participating Departments would need to agree the names and meaning of the XML elements to be used. In the case of this example, it might be agreed that the XML element to represent a person’s first name would be , as opposed to anything else, such as, , or whatever.

0 comments: