The ambiguous interpretation of concepts describing the meaning of data in data sources (for example, database schemata, extensible markup language [XML] document-type definitions [DTDs], Resource Description Framework [RDF] schemata, and hypertext markup language [HTML] form tags) is commonly known as semantic heterogeneity. Semantic heterogeneity, a well-known obstacle to data source integration, is resolved through a process of semantic reconciliation, which matches concepts from heterogeneous data sources. Traditionally, the complexity of semantic reconciliation required that it be performed by a human observer (a designer, a database administrator [DBA], or a user) (Hull 1997). However, manual reconciliation (with or without computer-aided tools) tends to be slow and inefficient in dynamic environments and, for obvious reasons, does not scale. Therefore, the introduction of the semantic web vision and the shift towards machine-understandable web resources has made clear the importance of automatic semantic reconciliation.
As an example, consider the web search, an information-seeking process conducted through an interactive interface. This interface may be as simple as a single input field (as in the case of a general-purpose search engine). Web interfaces may also be highly elaborate: consider a car rental or airline reservation interface containing multiple web pages, with numerous input fields, that are sometimes content dependent (for example, when a rented car is to be returned at the point of origin, no input field is required for the return location). A web search typically involves scanning and comparing web resources, either directly or by means of some information portal--a process hampered by their heterogeneity. Following the semantic web vision, semantic reconciliation should be inherent in the design of smart software agents for information seeking. Such agents can fill web forms and rewrite user queries by performing semantic reconciliation among different HTML forms.
To date, many algorithms have been proposed to support either semiautomatic or fully automatic matching of heterogeneous concepts in data sources. Existing matching algorithms make comparisons based on measures that are either syntactic in nature (such as term matching and domain matching) or based on model semantics. By model semantics, we mean the use of structural information that is provided by the specific data model to enhance the matching process. For example, XML provides a hierarchical structure that can be exploited in identifying links among concepts and thus allow a smooth web search.
In this article, we propose the use of application semantics to enhance the process of semantic reconciliation. Application semantics involves those elements of business reasoning that affect the way concepts are presented to users, such as layout. In particular, we pursue in this article the notion of precedence, in which temporal constraints determine the order in which concepts are presented to the user.
All matching techniques aim at revealing latent semantics in data model descriptions and utilizing it to enhance semantic reconciliation. To illustrate the differences among syntactic measures and data model semantics on the one hand and application semantics on the other hand, consider a specific data model, XML, providing a domain description. Many matching techniques advocate the comparison of linguistic similarity, based on the assumption that within a single domain of discourse, terminology tends to be homogeneous. Linguistic similarity is based on terms that appear in the XML file. XML also has a hierarchical structure, allowing nesting of terms within other terms. This is a data model-specific feature (that does not exist in a relational model, for example), and may drive another approach towards matching. The underlying assumption here is that hierarchy is a feature designers of all applications can use to model the domain of discourse better and thus can be used to identify similarities. …