Analyzing Real Estate Data Problems Using the Gibbs Sampler

Article excerpt

Real estate data are messy and difficult to obtain. While the value of U.S. real estate assets is greater than that of stocks and fixed-income securities combined, real estate lags behind mainstream finance and economics in empirical research. This disparity no doubt is driven largely by differences in data availability and quality. Difficulty in obtaining timely and reliable data remains a major obstacle to examining the relationships among key real estate variables, a point neatly summarized with a single statement in Fogler, Granito and Smith (1985): "It is well known that statistical analysis using publicly available real estate data is tenuous at best."

Nevertheless, real estate research continues to expand in a number of directions. Examples include methodological advances in estimating residential and commercial real-estate values, comparisons of rates of returns and risk profiles for alternative real estate investments, pricing a variety of real estate options, such as the implicit default option in mortgages or the option to wait to develop land, and identifying the effects of institutional practices. One common thread linking these and other diverse areas of real estate empirical research is messy data.

The nature of the real estate asset seriously impedes the ability to accurately estimate real-estate relationships. Properties are heterogeneous products that trade infrequently in markets that are highly localized. Moreover, data are often difficult to obtain, transaction records frequently contain empty data fields, and variables thought to influence relationships are routinely measured with error. Even when the researcher is able to overcome these problems, the ability to generalize estimates is hampered by the fact that transactions are infrequent, so that typically only a small portion of the population is represented in any cross section of data. Such data irregularities confound inference regarding real estate relationships using customary methods.

In this paper we introduce the Gibbs sampler, a simulation technique particularly well suited for fitting models in the presence of the above data irregularities. We demonstrate the effectiveness of the technique for a problem common to many real estate studies - missing data - and also describe its adaptation to several other frequently encountered situations.

The Gibbs sampler is a Monte Carlo sampling method which provides sample vectors, each approximately from a joint distribution of interest. It is implemented iteratively by making random draws from suitable conditional distributions. The sample obtained by using a particular component of these vectors is approximately from the marginal distribution of that component variable. From this description, connection of the Gibbs sampler to model fitting may not be apparent. We offer clarification in the sequel noting here that it has been applied to a wide variety of settings outside of real estate, and has been useful both in fitting complex models and in handling the challenges posed by data gathered outside of a laboratory environment. In the latter case the Gibbs sampler offers an alternative to other, sometimes ad hoc, methods of dealing with data inadequacies.

This paper is organized as follows. In the next section, we discuss a variety of data problems that confront real estate researchers and for which our technique is especially useful. Then, to ease the reader into the ensuing technical details, we provide an overview of simulation-based model fitting and the role of the Gibbs sampler. We next formalize details of the Gibbs sampler and its implementation in missing-data problems, and follow that with its application to a specific problem in real estate research: missing data in the hedonic estimation of house value. After demonstrating its effectiveness for this specific problem, we explain how the Gibbs sampler could be adapted to a number of other common real estate data inadequacies. …