Academic journal article URISA Journal

From Text to Geographic Coordinates: The Current State of Geocoding

Academic journal article URISA Journal

From Text to Geographic Coordinates: The Current State of Geocoding

Article excerpt

INTRODUCTION

The process of geocoding forms a basic fundamental component of spatial analysis in a wide variety of research disciplines and application domains (e.g., health [Vine et al. 1998, Boulos 2004, Rushton et al. 2006]; crime analysis [Olligschlaeger 1998, Ratcliffe 2001]; political science [Haspel and Knotts 2005]; computer science [Hutchinson and Veenendall 2005b, Bakshi et al. 2004]). This act of turning descriptive locational data such as a postal address or a named place into an absolute geographic reference has become a critical piece of the scientific workflow. However, the geocoding of today is a far cry from the geocoding of the past. Geocoding data that used to cost $4.50 per 1,000 records as recently as the mid-1980s (Krieger 1992) quickly moved to $1.00 per record in 2003 (McElroy et al. 2003), and can now be done for free with online services (e.g., Yahoo! Inc. [2006], Locative Technologies [2006]), with far greater spatial accuracy and match rates.

As the availability and accuracy of reference datasets have increased over the past several decades (Dueker 1974, Werner 1974, Griffin et al. 1990, Higgs and Martin 1995, Martin and Higgs 1996, Johnson 1998a, Martin 1999, Boscoe et al. 2004), geocoding has undergone marked transitions to accommodate and exploit changes in both data format and user expectations. These transitions can clearly be seen in the input, output, and internal processing of the geocoding process. The input data suitable for geocoding have expanded from simple postal addresses (O'Reagan and Saalfeld 1987) to include textual descriptions of relative locations (Levine and Kim 1998, Davis et al. 2003, Hutchinson and Veenendall 2005b). The output capabilities of the geocoding process have moved from simple nominal geographic codes (Tobler 1972, Dueker 1974, Werner 1974, O'Reagan and Saalfeld 1987) to full-fledged three-dimensional (3-D) geospatial entities (Beal 2003, Lee 2004). Likewise, the internal processing mechanisms that produce the geographic output have moved from simple feature assignment (O'Reagan and Saalfeld 1987) to complex interpolation algorithms using a variety of heterogeneous data sources (Bakshi et al. 2004, Hutchinson and Veenendall 2005a, b).

While significantly improving the usability, reliability, and accuracy of the geocoding process, these developments have brought with them a host of issues that a potential user must recognize and be prepared to contend with. Specific issues include the assumptions made during the interpolation process (Dearwent et al. 2001, Karimi et al. 2004), the underlying accuracy of the reference dataset (Gatrell 1989, Block 1995, Drummond 1995, Martin and Higgs 1996, Chung et al. 2004), the uncertainty in the matching algorithm (O'Reagan and Saalfeld 1987, Jaro 1984), and the choice of areal unit geocoded to (Krieger 1992, Geronimus et al. 1995, Geronimus and Bound 1998, Krieger et al. 2002a, 2003). These topics have received considerable research in recent times, and a great deal of literature is available. This article will survey the field of geocoding through a cross-disciplinary study of the geocoding literature focusing foremost on the technical aspects of the process. The changing concept of geocoding will be described, and the fundamental components of the geocoder will be outlined. Potential sources of error in the geocoding process will be explored, and particularly difficult geocoding scenarios requiring further research will be highlighted. The primary contributions of this article will be to inform the reader of the state of the art in geocoding through a discussion of its evolution over time and to warn of potentially sticky situations that can arise in the geocoding process if one is not aware of how one's decisions and assumptions can affect the geocoded results. This work should be seen as distinct from the recent work published by Rushton et al. (2006), which also offers a review of the geocoding process, but is focused on its application to health research, in particular cancer studies. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.