Academic journal article Canadian Journal of Public Health

Misclassification Errors from Postal Code-Based Geocoding to Assign Census Geography in Nova Scotia, Canada

Academic journal article Canadian Journal of Public Health

Misclassification Errors from Postal Code-Based Geocoding to Assign Census Geography in Nova Scotia, Canada

Article excerpt

Geocoding of data for population health research is now a regular occurrence.1-3 The word "geocoding" has become a common term in the population health discourse to describe the process of translating location information to an actual geographic location. It may be conducted using different types of location information, such as civic addresses, postal codes and place names. However, postal codes are often the only detailed geographic identifiers attached to research data in Canada, such as administrative health data. Even though finer location information, such as civic addresses, is normally collected for these data, it is not made available to researchers for privacy reasons. With the absence or unavailability of finer location information, postal code-based geocoding has become commonplace in population health research in Canada.

Postal codes have a number of limitations as a geographic identifier for geocoding. First, six-digit postal code area boundaries are poorly defined. Only the first three digits of the postal code (the forward sortation area or FSA) have geographic boundaries for which Statistics Canada provides tabulations. Use of only FSAs as geographic areas limits research because many FSAs are very large. For many research applications, smaller and more socially meaningful geographic areas are required. Second, postal codes do not necessarily indicate place of residence. They may indicate a post office or mailbox - a problem that will become more common with diminished home delivery. Third, postal codes do not correspond cleanly to standard geographic classifications used by Statistics Canada. Since researchers routinely rely on census or survey data to obtain population denominators or ecological variables, it is important that arealevel socio-demographic and health data can be accurately linked between data sources.

To address these problems, Statistics Canada has developed geocoding software and supporting data to impute geographic location in census-defined and electoral areas from six-digit postal codes (PCCF+, or the Postal Code Conversion File Plus). PCCF+ consists of SAS and STATA programs with associated data files.4 It allocates postal codes to a variety of different census administrative areas, including dissemination areas (DAs), census tracts and census subdivisions (CSDs). Whenever a postal code could match with more than one area, PCCF+ probabilistically allocates the postal code to one of the possible areas that overlap with the postal code boundary, using a supplemental file of estimated population weights.5 PCCF+ also produces an estimated latitude and longitude, which is the centroid of the smallest allocable census area (e.g., a block face in urban areas).

Despite the common use of the PCCF+ in Canadian research for postal code geocoding, the extent of locational misclassification and the potential impacts on research have not been well evaluated.6 The effect on research may vary depending on the geographic scope of research and how area data are used. For example, in rural and low density suburban areas, where one postal code can encompass a large geographic area, inaccuracies may be large. In studies of variation in area-level counts, incidence rates or prevalence rates (e.g., raw or age-standardized incidence of injury),7 locational misallocations may cancel each other out, limiting bias. On the other hand, in studies assessing area-level effects on health status or outcomes (e.g., regression analysis of association between neighbourhood characteristics and chronic diseases),8,9 locational inaccuracies may incorrectly match attributes between data sources. The analysis may be at an area level only;10,11 or may be multilevel - involving both area- and individual-level variables.9,12,13

Using highly accurate locational data for all buildings in Nova Scotia, this paper estimated the extent of locational misclassification error, by levels or rurality, resulting from the use of PCCF+ to geocode six-digit postal codes to census geographic areas. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.