Geographically Weighted Discriminant Analysis

Article excerpt

In this article, we propose a novel analysis technique for geographical data, Geographically Weighted Discriminant Analysis. This approach adapts the method of Geographically Weighted Regression (GWR), allowing the modeling and prediction of categorical response variables. As with GWR, the relationship between predictor and response variables may alter over space, and calibration is achieved using a moving kernel window approach. The methodology is outlined and is illustrated with an example analysis of voting patterns in the 2005 UK general election. The example shows that similar social conditions can lead to different voting outcomes in different parts of England and Wales. Also discussed are techniques for visualizing the results of the analysis and methods for choosing the extent of the moving kernel window.


In this article, an extension to discriminant analysis is proposed, which allows the discrimination rule to vary over space. The term Geographically Weighted Discriminant Analysis (GWDA) is proposed for this new method. The motivation here is similar to that for Geographically Weighted Regression (GWR)--in some situations, relationships between variables are not universal, but dependent on location. A major distinction between the two techniques is that, while GWR attempts to predict a measurement or ratio scale variable y given a set of predictors x = {[x.sub.1],..., [x.sub.m]}, GWDA analysis attempts to predict a categorical y variable. This is not the first article to suggest an application of geographical weighting to categorical data following the suggestions of Fotheringham, Brunsdon, and Charlton (2002). Atkinson et al. (2003) explore the relationships between riverbank erosion and various environmental variables using geographically weighted logistic regression, and Paez (2006) examines geographical variations in land use/transportation relationships using a geographically weighted probit model. There are similarities between discriminant analysis and logistic regression in that both are used to predict group membership from a set of predictor variables. The assumptions underlying each technique are rather different. Logistic regression may be the method of choice when the dependent variable has two groups due to its more relaxed assumptions (Maddala 1983).

In common with existing discussions of discriminant analysis, it is helpful to regard the data used here as having been drawn from a number of distinct populations, one for each unique category of y. The task of GWDA is then to assess which population a given, unlabeled {x} is likely to have come from. The difference between GWDA and standard discriminant analysis is that for GWDA this decision is made taking the geographical location of {x} into account. The GWDA technique proposed here exploits the fact that linear and quadratic discriminant analyses (LDA and QDA) rely only on the mean vector and covariance matrix of {x} for each population (the former assumes that the covariance is the same for all m populations). This being the case, the route to localizing LDA and QDA is via geographically weighted means and covariances, such as those proposed in Brunsdon, Fotheringham, and Charlton (2002) and Fotheringham, Brunsdon, and Charlton (2002). In the next section, LDA and QDA are reviewed. Following this, extending these techniques to forms of GWDA is considered. Finally, an empirical example using voting data in England and Wales is presented.

A review of discriminant analysis

Discriminant analysis is a technique used to identify which population a certain observation vector x belongs to, given a list of possible populations {1,..., m} and a training set of observations {[x.sub.ij]} where ij indicates the ith observation from population j. The simplest case occurs when m = 2 so that a binary classification has to be made. In this case, Fisher (1936) has used a decision theoretic approach to show that an optimal decision rule is to assign to population 1 if

[[[f. …