Metadata Models of the World Wide Web

Article excerpt

Abstract

The Semantic Web, in standards being developed by the World Wide Web Consortium, is a new way of defining metadata for use and reuse in a networked environment. In this chapter of "RDA Vocabularies for a Twenty-First-Century Data Environment, "we'll discuss the definition of metadata and how it involves the creation of domain models (the things and relationships that the metadata will describe) and ontologies (the vocabularies that the metadata will use). The use of standard identifiers, called Uniform Resource Identifiers, creates unambiguous identities for data and statements about data.

**********

The World Wide Web was developed as a web of documents. On this Web, digital documents would link to each other directly, allowing the user to follow the pointers provided by the author from a place in one document to another digital document. In hindsight, it seems obvious that while this ability to navigate the hyperlinks provided is extraordinarily powerful (and achieved something that is not possible in the analog world), the model lacked a key component for discovery, and that is meaningful metadata for the documents themselves. This problem has been partially overcome by the development of search engines that can index the actual text of the documents. Keyword indexing on uncontrolled text, however, lacks precision for searching.

The Semantic Web is a result of the realization that there is information in the documents on the Web that could be extremely valuable if it could be made actionable-that is, if there were a way to interact with the information inside documents, not just the documents themselves) The emphasis of the Semantic Web is on topical information within the Web resources: information about persons, places, things, events, and covering the full range of scientific and humanistic thought. To turn the web of documents into a web of data, the Web needs metadata to represent that information. This metadata will not look like standard bibliographic metadata. Bibliographic metadata represents a document or resource. The purpose of the Semantic Web is not to create metadata that represents documents or resources; it is to create metadata for the informational content of those resources.

While Web documents resemble the granularity of articles more than that of books, there is significant overlap in the topics covered by the Web and by libraries. Yet these remain two separate and distinct information spheres. In part this is because libraries hold primarily physical resources. Yet where libraries and the Web could collaborate through an intermingling of digital resources, they are unable to because they use different technologies. The Web relies entirely on search engines and keyword searches, while libraries create metadata in a library-specific record format (MARC) that is stored in closed databases. The development of metadata solutions that are compatible with Web-based technology and can be used both by libraries and on the open Web creates the possibility of making a connection between the two worlds.

In relation to libraries, the Web community is quite late in realizing the importance of metadata. There may have been an advantage to starting to think about metadata for the first time in a fully automated environment. The Semantic Web community began with a kind of metadata tabula rasa and a natural tendency to think about machine processing of data at a deep level. Its work began with a study of the basic nature of metadata, or at least the very nature of machine-actionable, networked, interoperable metadata.

Similar to the development of the underlying standards that make the Internet possible, such as TCP/ IP, the Semantic Web developers sought to develop the basic structure on which all other metadata would be developed. This basic structure is called the Resource Description Framework, or RDF. RDF itself relies on the Uniform Resource Identifier, the standard identifier format for the Web, and eXtensible Markup Language (XML), a set of rules for encoding documents and data electronically. …