Location-Specific Cumulative Distribution Function (LSCDF): An Alternative to Spatial Correlation Analysis
Wong, David W. S., Geographical Analysis
Quite often, geographical analysis involves comparing the spatial distributions of two variables or attributes. A typical method is to calculate the correlation coefficient of the two variables for corresponding areal units. Putting aside the fact that correlation coefficient is aspatial in nature (swapping attribute values between spatial units will not alter the value of correlation coefficient) and the issue of spatial dependency (or the potential existence of spatial autocorrelation) among observations, another major problem with using correlation measures for analyzing spatial data is the modifiable areal unit problem (MAUP), especially with the scale effect. Results from correlation analysis vary with the spatial resolutions based upon which spatial data are gathered. This paper presents an approach for spatial correlation analysis for count variables by comparing their cumulative spatial distributions. Using the concept of cumulative distribution function (CDF) in classical statistics, this paper shows that location-specific CDF (LSCDF) and its associated K-S-like statistic, which indicate the magnitude of difference between the two spatial distributions, are highly consistent over different levels of spatial scale. The application of the LSCDF approach is not restricted to isotropic spatial processes and the statistic provides a rather conservative conclusion. In addition, given any origin to construct LSCDFs, the LSCDFs can provide a geographic description of the two spatial distributions. By combining LSCDFs derived from different origins, a comprehensive understanding of the two distributions for the entire study area is developed. This approach for correlation analysis may offer a direction for future investigation of the MAUP.
One of the central themes in geographical analysis is to compare and evaluate the spatial distributions of different phenomena or variables in order to determine if the two phenomena or variables are related to each other. In the physical environment, one may be interested in how the spatial pattern of soil fertility level affects the amount of crop yield in different locations, or how the soil characteristics affect vegetative covers for different regions. In urban analysis, one may be interested if there is an association between income level and the level of education attainment in different parts of the city. In segregation study, the issue quite often is reduced to compare the spatial distribution patterns of different ethnic or racial groups to determine if their distributions resemble each other. A common approach to answer these questions is to conduct a correlation analysis in addition to a few other less common methods (Unwin 1981).
Correlation analysis has been accepted extensively among geographers and geoscientists such that its validity is not often questioned. This type of analysis treats spatial data as aspatial data. Correlation analysis, however, has to adhere to the assumptions in classical statistics, and one of those assumptions is the independence of observations. Observations in spatial data frequently violate this assumption of independence (Anselin and Griffith 1988) because geographical data are probably spatially autocorrelated to a certain degree. A related issue is that a correlation coefficient is not a spatial measure. Swapping attribute values between areal units will not change the correlation level although the spatial patterns of attributes could be completely different (Goodchild 1992). Putting aside these issues, another limitation of correlation analysis is that the results are sensitive to the spatial scale upon which the data are gathered and tabulated. When correlation analysis and many other analytical too ls are used on data gathered at different levels of spatial resolution, the results are probably inconsistent over scales. This is the so-called scale effect under the umbrella of the modifiable areal unit problem (MAUP). For instance, using the 1990 census data of the United States, the correlation coefficient of white and black population counts by states is 0. …