Academic journal article Cityscape

Understanding and Enhancing the U.S. Department of Housing and Urban Development's ZIP Code Crosswalk Files

Academic journal article Cityscape

Understanding and Enhancing the U.S. Department of Housing and Urban Development's ZIP Code Crosswalk Files

Article excerpt

ZIP Codes Are Problem Geographies

Organizations use ZIP Codes for many analytical tasks, such as to verify addresses, allocate resources, or create analytical products (for example, maps, tables, or conduct reports). Although ZIP Codes have legitimate-but limited-use in analysis, they have adverse effects on the results. ZIP Codes are problematic, because their boundaries are not created for analytical purposes like other geographies. ZIP Codes were designed to more efficiently deliver mail, not as geographies to be used for analysis. Because of their nature, the boundaries vary in size and shape that amplifies a common, adverse statistical problem when used for analysis. This effect, known as the Modifiable Areal Unit Problem (MAUP), is ever present in analyses that use geography It is well documented in several studies on how ZIP Codes are notorious for distorting policy-related analyses (Beyer, Schultz, and Rushton, 2007; Cudnick et al., 2012; Dai, 2010; Grubesic and Matisziw, 2006; Hipp, 2007; Krieger et al., 2002; Montalvo and Reynal-Querol, 2017; Wilson, 2015). Further, when mapping ZIP Code data, the choice of the thematic mapping method used to display the data can further exacerbate the misrepresentation of results through erroneous patterns depicted in the map (Wilson, 2011). A final deficiency in the use of ZIP Codes for analysis is that they typically do not contain any social, demographic, or economic data that can be used to create contextualized statistics of ratios, percentages, rates, or densities from the record counts contained within. When data are provided, those values are distorted from the same aggregation problems mentioned in the following paragraphs.

First, grouping characteristics by an area boundary that is too large, oddly shaped, or a combination of both leads to summary statistics that may not be representative of the population within the boundary. A second adverse effect is one that can hide a pattern of extreme values on either end of the characteristic values (Wilson, 2013). In this instance, extreme differences that exist for a characteristic are canceled out because the summary statistic represents the norm and does not reveal either extreme. A third adverse effect, and related to the second,2 is the reversal of a relationship if two characteristics are examined together (Hipp, 2007; Montalvo and ReynalQuerol, 2017; Wilson, 2015). Here, a positive or negative relationship between two characteristics is reversed from what it would be if a more appropriate area were used in the analysis because the characteristics would have been assigned to differing areas.

About Allocating ZIP Code Data to Other Geographies

Several private companies obtain address data from the U.S. Postal Service (USPS) or ZIP Code geographies from the U.S. Census Bureau to enhance them and sell the information. Companies in the private industry add value to these products, making them more robust by adding geographic information or the creation of boundaries for mapping.3 The boundaries created are estimated or modified by delineating areas using topographical point or line landmarks following each organization's proprietary method, including comparisons with the ZIP Code Tabulation Areas (ZCTAs) provided by the Census Bureau. Each company has their own undisclosed method of creating these boundaries, with each claiming theirs is the most accurate.

ZIP Codes typically overlap with other geographies and cannot always be completely associated with areas in another geography. Exhibit 1 demonstrates how a decision must be made in allocating address counts from ZIP Codes to the census tract geography. The map shows a ZIP Code (light gray outline) crosscutting three census tracts (dark gray outline) and the geographic distribution of addresses within each tract. For many analyses, address counts must be associated with only one other geography, lest the addresses be counted multiple times and adversely affect statistical results. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.