Robust Principal Component Analysis and Geographically Weighted Regression: Urbanization in the Twin Cities Metropolitan Area of Minnesota

Article excerpt

INTRODUCTION

We have long altered the land by clearing forests, farming, and building settlements. This land change has serious social and environmental impacts, many of which are increasingly evident in urban areas that now host the majority of the world's population. In the United States, urbanization is driven primarily by suburbanization or decentralized, low-density residential land use, and creation of far-flung suburbs or exurbanization. While suburbanization offers important benefits such as affordable housing, it also has negative impacts on systems ranging from transportation to natural habitat to infrastructure efficiencies to inner-city economies (Burchell et al. 1998, Daniels 1999, EPA 2001).

The magnitude and nature of urbanization impacts are tied not only to the amount of land converted to urban use but also to its spatial configuration and pattern (IGBP-IHDP 1995). Dispersed urbanization, for example, creates infrastructure inefficiency by spreading out roads or sewer networks. Despite the importance of spatial patterning in determining impacts of urbanization, a good deal of urban research focuses on aggregate measures such as commute time or population density (Galster et al. 2001). Though this synoptic view is a critical avenue for research, it may not capture the temporal and fine-scaled spatial patterns and processes of urbanization (Hasse and Lathrop 2003).

A variety of approaches meet the need to examine and model land use at fine spatial scales, and to these we add a new one. Methodologies range from simple mathematical formulas and gravity models to sophisticated spatiotemporal simulations (Kaimowitz and Angelsen 1998, Lambin 1994, Parker et al. 2003). In this paper, we present a hybrid approach--robust principal component geographically weighted regression (RPCGWR)--to examine both the location of urban land use and the relative influence of socioeconomic, demographic, policy, and environmental factors. We integrate two different methods, robust principal component analysis (RPCA) and geographically weighted regression (GWR) to create a novel alternative to standard statistical approaches. First, to reduce the dimensions and number of primary regressors, we applied principal component analysis (PCA) to the explanatory variables. To account for the influence of outliers in standard PCA, we conducted a robust principal component analysis (RPCA) by employing a projection pursuit approach. Second, to capture spatial heterogeneity in the urban landscape, we conducted GWR on the robust principal components (RPCs). We compared the results of the RPCGWR with a standard global principal component regression (RPCGR) and used a series of visual and statistical comparisons to better understand how RPCGWR lends insight into the complex dynamics of urban land use.

STUDY AREA AND BACKGROUND

Urbanization has profound implications for the environmental and socioeconomic sustainability of communities such as the Twin Cities Metropolitan Area (TCMA) of Minnesota (see Figure 1). This 7,700 [km.sup.2] seven-county area is the economic hub of a multistate region. Home to 2.8 million people, it is forecasted to top 3.5 million by 2020. It is also a major center of sprawl, the rapid expansion of low-density suburbs into formerly rural areas and the creation of urban, suburban, and exurb agglomerations buffered from others by undeveloped land. The metropolitan region also has seen a marked increase in sprawl and associated aspects such as traffic congestion (CEE 1999, Schrank and Lomax 2004).

The TCMA is an ideal setting for examining land use. The region exemplifies the spatial and temporal dynamics of urbanization in the United States. It serves as the hub for a large geographic area and stands in relative isolation from other large urban agglomerations, making it easier to extract land-use dynamics at the metropolitan scale. …