The Relationship between the Length of the Base Period and Population Forecast Errors
Smith, Stanley K., Sincich, Terry, Journal of the American Statistical Association
The base period of a population forecast may be defined as the time period from which historical data are collected and statistical relationships are formulated to provide forecasts of future population values. A recent discussion of population forecast errors raised the question of how changes in the length of the base period might affect the accuracy and bias of population forecasts (Beaumont and Isserman 1987; Smith 1987). Although many studies have investigated the effects of differences in forecasting models, sources of data, size of place, and length of forecast horizon on population forecast errors, few have considered the potential impact of changes in the length of the base period. This study analyzes the relationship between the length of the base period and population forecast errors, using three simple forecasting techniques and data from 1900 to 1980 for states in the United States.
In this article, a population forecast is defined as the future population value produced by a particular forecasting technique and set of base data, and forecast error refers to the percentage difference between a forecast and the actual population for the same year. The following terminology is used to describe population forecasts.
1. Base year: the year of the earliest observed population size used to make a forecast.
2. Launch year: the year of the latest observed population size used to make a forecast.
3. Target year: the year for which population size is forecasted.
4. Base period: the interval between base year and launch year.
5. Forecast horizon: the interval between launch year and target year.
2. DATA AND TECHNIQUES
The data used in this study were taken from U.S. Census Bureau reports showing decennial census counts and annual intercensal estimates for states in the United States from 1900 to 1980 (U.S. Bureau of the Census 1956, 1965, 1971, 1976, 1982, 1984). These reports covered all states and the District of Columbia from 1950 onward, and all except Alaska and Hawaii from 1900 to 1949. The data refer to total population only; no analysis was performed on age, sex, race, or other characteristics of the population.
The intercensal estimates made by the Census Bureau were based on statistical series that reflect changes in population size. For all decades, estimates were based on annual data on births, deaths, and school enrollment. For some decades, five-year migration data from the decennial census were used as well. In a few instances, data from special censuses were included. In recent years, annual data from federal income-tax returns and Medicare records were included. All intercensal estimates were controlled to ensure that they were consistent with decennial census counts. Although they certainly contain some errors (especially for years prior to 1930), we believe these estimates are quite reliable and that they provide a useful basis for investigating the effects of the length of the base period on population forecast errors.
Three simple extrapolation techniques were used to produce population forecasts. First was linear extrapolation (LINE), which assumes that a population will increase (decrease) by the same number of persons in each future year as the average annual increase (decrease) during the base period:
[[^.P].sub.t] = [P.sub.0] + x/y([P.sub.0] - [P.sub.b]), (1)
where [[^.P].sub.t] is the state population forecast for the target year, [P.sub.0] is the state population in the launch year, [P.sub.b] is the state population in the base year, x is the number of years in the forecast horizon, and y is the number of years in the base period.
The second technique was exponential extrapolation (EXPO), which assumes that a population will increase (decrease) at the same annual percentage rate in each future year as it increased (decreased) during the base period: