Quite often, geographical analysis involves comparing the spatial distributions of two variables or attributes. A typical method is to calculate the correlation coefficient of the two variables for corresponding areal units. Putting aside the fact that correlation coefficient is aspatial in nature (swapping attribute values between spatial units will not alter the value of correlation coefficient) and the issue of spatial dependency (or the potential existence of spatial autocorrelation) among observations, another major problem with using correlation measures for analyzing spatial data is the modifiable areal unit problem (MAUP), especially with the scale effect. Results from correlation analysis vary with the spatial resolutions based upon which spatial data are gathered. This paper presents an approach for spatial correlation analysis for count variables by comparing their cumulative spatial distributions. Using the concept of cumulative distribution function (CDF) in classical statistics, this paper shows that location-specific CDF (LSCDF) and its associated K-S-like statistic, which indicate the magnitude of difference between the two spatial distributions, are highly consistent over different levels of spatial scale. The application of the LSCDF approach is not restricted to isotropic spatial processes and the statistic provides a rather conservative conclusion. In addition, given any origin to construct LSCDFs, the LSCDFs can provide a geographic description of the two spatial distributions. By combining LSCDFs derived from different origins, a comprehensive understanding of the two distributions for the entire study area is developed. This approach for correlation analysis may offer a direction for future investigation of the MAUP.
One of the central themes in geographical analysis is to compare and evaluate the spatial distributions of different phenomena or variables in order to determine if the two phenomena or variables are related to each other. In the physical environment, one may be interested in how the spatial pattern of soil fertility level affects the amount of crop yield in different locations, or how the soil characteristics affect vegetative covers for different regions. In urban analysis, one may be interested if there is an association between income level and the level of education attainment in different parts of the city. In segregation study, the issue quite often is reduced to compare the spatial distribution patterns of different ethnic or racial groups to determine if their distributions resemble each other. A common approach to answer these questions is to conduct a correlation analysis in addition to a few other less common methods (Unwin 1981).
Correlation analysis has been accepted extensively among geographers and geoscientists such that its validity is not often questioned. This type of analysis treats spatial data as aspatial data. Correlation analysis, however, has to adhere to the assumptions in classical statistics, and one of those assumptions is the independence of observations. Observations in spatial data frequently violate this assumption of independence (Anselin and Griffith 1988) because geographical data are probably spatially autocorrelated to a certain degree. A related issue is that a correlation coefficient is not a spatial measure. Swapping attribute values between areal units will not change the correlation level although the spatial patterns of attributes could be completely different (Goodchild 1992). Putting aside these issues, another limitation of correlation analysis is that the results are sensitive to the spatial scale upon which the data are gathered and tabulated. When correlation analysis and many other analytical too ls are used on data gathered at different levels of spatial resolution, the results are probably inconsistent over scales. This is the so-called scale effect under the umbrella of the modifiable areal unit problem (MAUP). For instance, using the 1990 census data of the United States, the correlation coefficient of white and black population counts by states is 0.8239, while at the county level the correlation is 0.7315. Both correlation coefficients are statistically significance at 0.05. The intent of this paper is to develop a correlation technique that is less sensitive to scale changes in such a way that correlation analysis performed at one scale level can reasonably represent the situations at other scale levels.
Even though the MAUP and its scale effect subproblem (the other subproblem is aggregation or zoning effect) were identified almost seven decades ago by Gehlke and Biehl (1934), it was not formally addressed until about four decades later by Openshaw (1977, 1978) and Openshaw and Taylor (1979). They also evaluated systematically the impacts of the MAUP on correlation analysis. In brief, scale effect refers to the inconsistent analytical results when data gathered and tabulated for areal units of different levels of spatial resolution or sizes are analyzed. Given a level of spatial resolution, the study area can be divided by various spatial partitioning schemes. Data tabulated for these different schemes will also provide inconsistent results. This is known as the zoning effect. This paper addresses only the scale effect in respect to correlation analysis.
Different scenarios have been suggested to handle the MAUP. For instance, Fotheringham (1989) proposed acknowledging the MAUP effects whenever possible. This approach is rather straightforward as long as researchers admit the uncertainty of analytical results due to the MAUP effects. With the advances of computational technology, it is within reach to perform the same analysis but using data at various scale levels to acknowledge the scale effect. There are also numerous attempts to provide consistent results from using data gathered at different scale levels through modeling (Holt, Steel, and Tranmer 1996). A very challenging approach, however, is to develop new techniques that can yield scale-insensitive or scale-variant analytical results.
This paper follows the direction of developing scale-independent spatial analytical tools. As shown by many empirical studies, the correlation coefficient is highly sensitive to the level of spatial aggregation. Because many classical statistical models are based upon the correlation among variables, the scale sensitivity of correlation analysis has highly significant implications. In this paper, I demonstrate that for any two count random variables, the comparison of their location-specific cumulative distribution functions can provide relatively stable results across different scale levels. This research is still exploratory in nature, but does provide encouraging results to support the idea that some techniques may be less scale-sensitive than others.
The next section provides a brief overview of the impacts of scale effect with the emphasis on correlation analysis. Then the second section discusses the concept of cumulative distribution function (CDF) and its related Kolmorogov-Smirnov (K-S) statistic. Their spatial counterparts (location-specific CDF - LSCDF) and the [D.sub.s] statistic are then described in the third section. The fourth section offers an explanation that LSCDF may be less sensitive to scale changes. Using a simulation and two empirical studies, the fifth section shows stable results across different scale levels. The empirical studies also reveal several spatial properties of LSCDF. They are discussed further in the sixth section, followed by a concluding section.
1. THE SCALE EFFECT ON STATISTICAL ANALYSIS
So far, most research efforts related to the MAUP have been focused on the impacts of the two subproblems, scale and zoning effects. The classical work by Open-shaw and Taylor (1979) was shocking because the correlation exhibited an increasing tendency when smaller areal units were aggregated to larger areal units. Using the simplest bivariate regression model, Clark and Avery (1976) show the tip of the iceberg of the MAUP. Later Fotheringham and Wong …