Areal interpolation is the process of estimating the values of one or more variables in a set of target polygons based on known values that exist in a set of source polygons. The need for areal interpolation arises when data from different sources are collected in different areal units. In the United States for example, spatial data that have been collected in census zones such as block groups and tracts is very common. Many businesses that make use of spatial data will often aggregate their data into zip codes, marketing analysis zones or service/trade areas. On the other hand, a useful data source may be aggregated based on natural rather than political boundaries. Because zones such as zip codes, service areas, census tracts and natural boundaries are incompatible with one another, areal interpolation is necessary to make use of all of this data from various sources.
There are many different methods of areal interpolation. Each method is unique in its assumptions about the underlying distribution of the data. The more modern methods make use of ancillary data, which can give insight to the underlying distribution of the variable. The choice of which method to use may be dependent on various factors such as ease of implementation, accuracy, data availability and time. This research will be conducted as a comparative analysis of four different areal interpolation methods. These include the areal weighting method, the pycnophylactic method, a dasymetric method using remote sensing data, and the road network hierarchial weighted method.
These methods not only differ in their assumptions of the distribution of the data, but also in the dimensionalities associated with each method. In the spatial data sciences, 0, 1, 2 and 2 1/2 dimensional (D) objects refer to points, lines, polygons and surfaces respectively. The areal weighting method is a 2-D polygon overlay method which sums the weighted variable within least common geographic units (LCGUs) of common target zones to derive target zone estimates. The LCGUs are the result of overlaying two or more sets of polygons. The pycnophylactic method creates a 2 1/2-D continuously smooth surface of the variable, and predicts target zone estimates as the volume within each zone. The dasymetric method makes use of a 2D zonal system that represents residential land use types. This method also uses LCGU's, which are geometric intersections of the source zones, land use zones and target zones. The network method makes use of 1-D road network data as ancillary data. The interpolated values within each target zone will be compared to known values within the target zones. The variable that is interpolated in this research is population. To the authors' knowledge, these particular methods have not been tested against each other in prior literature.
There are a large variety of areal interpolation methods that exist. For the purposes of this paper, many important methods will be discussed, with an emphasis on those methods relevant to this research.
Areal Interpolation Methods without Ancillary Data
The following literature focuses on areal interpolation methods that do not make use of ancillary data. The overlay method (Lain, 1983), also commonly referred to as the areal weighting method interpolates a variable based on the area of intersection between the source and target zones. Intersection zones are created by the overlay of source and target zones. Target zone values are then estimated based on the values of the source zone and the proportion of the intersection with the source zone by the following formula:
[Z.sub.t] = [summation over (s)] [Z.sub.s]([A.sub.st][A.sub.s]) (1)
Z = value of the variable;
A = area;
s, t = source and target zones, respectively.
Although this method does preserve volume, it assumes that the variable is homogeneously distributed within the source zones (Lam, 1983).
The pycnophylactic method "assumes the existence of a smooth density function which takes into account the effect of adjacent source zones" (Lain, 1983). This is a method proposed by Tobler (1979). This method originally assigns each grid cell the value of the source zone divided by the number of cells within that source zone. A new Z value is computed for each cell as the average of its four neighbors:
[Z.sub.i,j] = 1/4 ([z.sub.i,j+1] + [z.sub.i,j-1] + [z.sub.i+1,j] + [z.sub.i-1,j]) (2)
The predicted values in each source zone are then compared with the actual values, and adjusted to meet the pycnophylactic condition. The pycnophylactic condition is defined as follows:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)
[R.sub.i] = the ith region;
[H.sub.i] = the value of the variable in region i; and
Z(x,y) = the density function.
This is an iterative procedure that continues until there is either no significant difference between predicted values and actual values within the source zones, or there have been no significant changes of cell values from the previous iteration. The target zone values can then be interpolated as the sum of the values of cells within each target zone.
Other methods of areal interpolation that do not make use ancillary data include the "point-based areal interpolation approach (Lam, 1983)" that used 0-D interpolation techniques. The points are generally chosen as the centroids of the source polygons. The main criticism of these methods is that they are not volume preserving. Recently, Kyrialddis (2004) has been able to preserve the actual volume of the source zone using the geostatistical method of kriging. Other point based methods include the "point-in-polygon" …