A Comparative Analysis of Areal Interpolation Methods

Article excerpt


Areal interpolation is the process of estimating the values of one or more variables in a set of target polygons based on known values that exist in a set of source polygons. The need for areal interpolation arises when data from different sources are collected in different areal units. In the United States for example, spatial data that have been collected in census zones such as block groups and tracts is very common. Many businesses that make use of spatial data will often aggregate their data into zip codes, marketing analysis zones or service/trade areas. On the other hand, a useful data source may be aggregated based on natural rather than political boundaries. Because zones such as zip codes, service areas, census tracts and natural boundaries are incompatible with one another, areal interpolation is necessary to make use of all of this data from various sources.

There are many different methods of areal interpolation. Each method is unique in its assumptions about the underlying distribution of the data. The more modern methods make use of ancillary data, which can give insight to the underlying distribution of the variable. The choice of which method to use may be dependent on various factors such as ease of implementation, accuracy, data availability and time. This research will be conducted as a comparative analysis of four different areal interpolation methods. These include the areal weighting method, the pycnophylactic method, a dasymetric method using remote sensing data, and the road network hierarchial weighted method.

These methods not only differ in their assumptions of the distribution of the data, but also in the dimensionalities associated with each method. In the spatial data sciences, 0, 1, 2 and 2 1/2 dimensional (D) objects refer to points, lines, polygons and surfaces respectively. The areal weighting method is a 2-D polygon overlay method which sums the weighted variable within least common geographic units (LCGUs) of common target zones to derive target zone estimates. The LCGUs are the result of overlaying two or more sets of polygons. The pycnophylactic method creates a 2 1/2-D continuously smooth surface of the variable, and predicts target zone estimates as the volume within each zone. The dasymetric method makes use of a 2D zonal system that represents residential land use types. This method also uses LCGU's, which are geometric intersections of the source zones, land use zones and target zones. The network method makes use of 1-D road network data as ancillary data. The interpolated values within each target zone will be compared to known values within the target zones. The variable that is interpolated in this research is population. To the authors' knowledge, these particular methods have not been tested against each other in prior literature.

Prior Literature

There are a large variety of areal interpolation methods that exist. For the purposes of this paper, many important methods will be discussed, with an emphasis on those methods relevant to this research.

Areal Interpolation Methods without Ancillary Data

The following literature focuses on areal interpolation methods that do not make use of ancillary data. The overlay method (Lain, 1983), also commonly referred to as the areal weighting method interpolates a variable based on the area of intersection between the source and target zones. Intersection zones are created by the overlay of source and target zones. Target zone values are then estimated based on the values of the source zone and the proportion of the intersection with the source zone by the following formula:

[Z.sub.t] = [summation over (s)] [Z.sub.s]([A.sub.st][A.sub.s]) (1)


Z = value of the variable;

A = area;

s, t = source and target zones, respectively.

Although this method does preserve volume, it assumes that the variable is homogeneously distributed within the source zones (Lam, 1983). …