Academic journal article Journal of Digital Information Management

Accessing Heritage Documents According to Space Criteria within Digital Libraries

Academic journal article Journal of Digital Information Management

Accessing Heritage Documents According to Space Criteria within Digital Libraries

Article excerpt

ABSTRACT: Local cultural heritage document collections are characterized by contents strongly attached to a territory and its associated land history. Our contribution aims at enhancing such a content retrieval process efficiently each time a query includes geographic criteria. We propose a unified model for a formal representation of geographic information. This geographic model allows space features to be described independently of their representation mode (text, graphics) in the documents. We have developed a prototype implementing geographic Information Extraction (IE) and geographic Information Retrieval (IR) processes. We process geographic IE with semantic techniques combined to classic IE approaches. Then, we implement geographic IR with intersections researching algorithms: these algorithms search for all geocoded entities in the documents collections indexes which intersect any entity in the user's query. This paper focuses on IR and Visualization proposals relying on the geospatial characteristics of cultural heritage corpora.

Categories and Subject Descriptors

H.3.7 [Digital Libraries] Collection; H.2.8 [Database Applications]: Spatial databases and GIS; H.3.3 [Information Search and Retrieval]

General Terms

Geographic information retrieval system, Digital document management

Keywords: Geographic Model, Geographic Information Retrieval and Visualization, Non-Structured Documents, Digital Libraries, Cultural Heritage

1. Introduction

Smart space information retrieval and visualization is the main goal of the work presented in this paper. Generally, space information is either supported by RDBMS (Relational Data Base Management Systems) and GIS (Geographic Information Systems) for structured data management, or, by Electronic Document Management Systems (EDMS) and Library Management Systems (LMS) for semi-structured and non-structured data. All these systems aim to provide fast and effective content-based access to a large amount of information. Although GISs contain high-level space operators that are uncommon in conventional DBMSs, they are not sufficient for queries in which the semantics of the search criteria concerns space relations [Clementini & al., 1994]. The results are also unsatisfying if we consider EDMSs that usually implement statistical approaches to answer such queries.

The purpose of the Virtual Itineraries in the Pyrenees (PIV--Pyrenees Itineraires Virtuels) project consists in managing a repository of electronic versions of books, newspapers, postcards, lithographs of the XIXth and XXth century. Information is mainly textual and presents many territorial aspects of the Pyrenees (a mountain range in the south west of France) [Casenave & al., 2004]. This corpus is still relatively unknown. It is accessible only in regional museums and library archives. This is why the local media library supporting this project aims at the diffusion of these resource collections: their added-value remains centred on local cultural heritage and, therefore, geographic characteristics. To complete statistical and full-text analysis approaches, we propose a more accurate semantic approach to analyze and interpret geographic information contained in such a corpus (or in a query) [Marquesuzaa & al., 2005], [Etcheverry & al., 2005], [Sallaberry & al., 2006].

Geographically related queries form nearly one fifth of all queries submitted to the Excite search engine, the terms occurring most frequently being place names [Sanderson and Kohler, 2004]. Our contribution focuses on digital libraries and proposes to extend the basic services of existing Library Management System with new ones dedicated to geographic information extraction and retrieval (PIV project [Lesbegueries & al., 2006]).

Our contribution aims at better integrating the space dimension in the information retrieval systems. In a more accurate way, our contribution focuses on five proposals:

--we both propose a space unified model allowing any space feature to be described as well as an indexing process adapted to space features presented within analyzed documents. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.