Magazine article Information Today

Preparing Data for the Web with SGML/XML

Magazine article Information Today

Preparing Data for the Web with SGML/XML

Article excerpt

Why convert electronic data yet again, to yet another electronic format?

Publishing these days is no longer the sedate and deliberate business it was in times gone by. We've joined the information age and change seems to have become the order of the day. No longer, W parently, do we have the luxury of settling into a new technology, getting comfortable with it, and expecting to live with it for 5 or 10 years. We are now synchronized to Internet time, expected to change our data representations and our tool sets as frequently as we change our clothes.

The effect on productivity is pronounced--and definitely not positive. It's more than just a retraining issue, too. The transition to a new data representation may be time-consuming and expensive. Moreover, as you move from one typesetting format to another, or from one word processing format to another, critical information that is embedded not in the text per se, but in its formatting, or in metadata, is often lost or distorted in the transition.

A paragraph that conveyed a message of special significance in the original data because of its appearance--something demarked as important, for example--may no longer convey that message after the data are converted to the format du jour. Your data lose value. At best, it will be costly to restore. At worst, it may not be recoverable at all.

Just Another Step on the Conversion Treadmill--Not!

One of the ironies of life is that to cure a chronic pain you often need to suffer some immediate discomfort. There is a solution to the pain of chronic conversion. It will not only buy you freedom from the conversion merry-go-round, but will also enhance the value of your information, assets and allow you to leverage your data onto new publishing platforms. However--and here is the discomfort--you will have to convert your data one more time to obtain these benefits. You'll have: to convert it to SGML or XML.

What Are SGML and XML?

SGML (Standard Generalized Markup Language) is fundamentally different from typesetting and word processing formats in that it focuses on identifying document structure rather than document appearance. SGML has been around for a decade, but it is snowballing in popularity today in the guise of its younger brother, XML (Extensible Markup Language). XML is a streamlined version of SGML, with a number of esoteric features eliminated and with relaxed enforcement of certain structure restrictions.

Both SGML and XML are industry-standard, non-proprietary formats that use ASCII as their base. Text elements are identified using tags, which describe each element's function--the kind of content it contains. For example, the subhead that precedes this section would look like this:

What Are SGML and XML?

Note that the tag itself is ASCII text, that it delimits both the beginning and the end of the text element (which, here, happens to be a subheading), and that there is no formatting information inherent in the element. In an XML- or SGML-formatted document, all text elements are similarly tagged.

Different tag sets are appropriate for different kinds of documents. A leo brief has different text elements than does a biology textbook. SGML uses a construct called a DTD (Document Type Definition) to define tag sets by naming the tags and describing their interrelationships. Different kinds of documents have different DTDs. The DTD for this article, for example, defines a tag called "subhead" and restricts it to appearing after a major heading. Because the DTD defines tag interrelationships, documents can be validated against their DTDs to establish conformity--to assure, in other words, that the structure specified for the document is adhered to.

Industries are standardizing on tag sets to further refine content identification. The medical industry might define a tag to identify articles dealing with heart disease research. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.