Academic journal article Geographical Analysis

A Genetic Approach to Detecting Clusters in Point Data Sets

Academic journal article Geographical Analysis

A Genetic Approach to Detecting Clusters in Point Data Sets

Article excerpt

Spatial analysis techniques are widely used throughout geography. However, as the size of geographic data sets increases exponentially, limitations to the traditional methods of spatial analysis become apparent. To overcome some of these limitations, many algorithms for exploratory spatial analysis have been developed. This article presents both a new cluster detection method based on a genetic algorithm, and Programs for Cluster Detection, a toolkit application containing the new method as well as implementations of three established methods: Openshaw's Geographical Analysis Machine (GAM), case point-centered searching (proposed by Besag and Newell), and randomized GAM (proposed by Fotheringham and Zhan). We compare the effectiveness of cluster detection and the runtime performance of these four methods and Kulldorf's spatial scan statistic on a synthetic point data set simulating incidence of a rare disease among a spatially variable background population. The proposed method has faster average running times than the other methods and significantly reduces overreporting of the underlying clusters, thus reducing the user's postprocessing burden. Therefore, the proposed method improves upon previous methods for automated cluster detection. The results of our method are also compared with those of Map Explorer (MAPEX), a previous attempt to develop a genetic algorithm for cluster detection. The results of these comparisons indicate that our method overcomes many of the problems faced by MAPEX, thus, we believe, establishing that genetic algorithms can indeed offer a viable approach to cluster detection.



As geographic data sets grow both in complexity and size, our current spatial analysis toolboxes must expand to include methods that are more computationally efficient, and methods able to uncover previously unknown patterns or relationships. We concentrate here on methods for detecting and analyzing geographical clusters. Many societal problems, such as understanding urban crime patterns and detecting clusters of rare diseases, require methods for cluster detection and analysis that are able to address very large and complex data sets (Openshaw 1995). It is the purpose of this article to explore the utility of a new cluster detection method based around a genetic algorithm, and to compare its performance with several established methods, with the aim of improving computational efficiency and flexibility, without sacrificing the basic ability to detect and report clusters.

Cluster analysis of point data has been used within geography for many years and for many different purposes. One application that has received much attention is the detection and analysis of clusters of disease (Pinkel and Nefzger 1959; Ederer, Myers, and Mantel 1964; Mantel 1967; Besag and Newell 1991; Ahrens et al. 2001). In this application, it is not sufficient to simply find where the individual cases of the disease are clustered, but to find regions that have high rates of the disease. This requires that the disease rate be compared with the background population that is susceptible to the disease, so that regions with higher-than-expected disease rates can be found. As such, cluster detection methods that fail to account for a spatially variable background population are less useful. Many currently available techniques are also computationally slow and tend to produce many positive solutions (i.e., circles or ellipses) for each cluster in the data set. This overreporting of clusters can lead to a prohibitively large workload for postprocessing. Finally, many methods make assumptions about the distribution and shape of clusters that may not apply to the data set. Therefore, techniques that can account for a spatially variable background population, scale easily to large data sets, minimize or generalize assumptions, and minimize the number of solutions returned per cluster are needed.

This article develops an approach to detecting spatial point clusters in a background population that is based on a genetic algorithm, which is a heuristic search process. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.