Academic journal article Journal of Real Estate Literature

Latent Semantic Analysis and Real Estate Research: Methods and Applications

Academic journal article Journal of Real Estate Literature

Latent Semantic Analysis and Real Estate Research: Methods and Applications

Article excerpt

We introduce the analytic process in this paper and provide an overview of the applications and challenges of using latent semantic analysis (LSA), a form of content analysis, as a way to analyze unstructured text data in the context of the real estate discipline. LSA is a statistical tool that processes data in a method similar to factor analysis, allowing for the examination of large bodies of textual data by identifying patterns within text that represent themes or concepts that convey meaning. In recent years, there has been an explosion in the use of social media, the consumption of electronic media in multiple platforms, and the availability of digitized information across industries and disciplines. News stories are no longer consumed by readers, they are co-created with the contributions of active communities that post comments, images, and videos, forward emails, and retweet messages. Repositories of industry reports emerge and proliferate at an increasing pace.

The real estate discipline is also witnessing this change. From property descriptions, to location characteristics, to market trends, to investor reports, real estate professionals and researchers interact with increasing volumes of digital content that reaches them in an increasing variety of formats and at an increasing velocity of data flow. As a result, there is an increasing demand for analytic approaches that can process and take advantage of all the ''big data'' generated, stored, and moved around by this content creating and communication activity. To address this challenge, a number of content analytic algorithms have been introduced in the big data analytics arena. These include visualizations through word clouds, sentiment analysis (Montoyo, MartíNez-Barco, and Balahur, 2012), text clustering (Larsen and Monarchi, 2004), and topic modeling based on latent Dirichlet allocation (Blei, 2012). For most ordinary data sets, alternative methods for content analysis will produce compatible results, given a data set's underlying conceptual content, which all these methods are likely to capture to some extent. What separates LSA from a number of alternative text-mining and text-analytic methods is the vast literature in psychology, linguistics, cognitive science, neuroscience, education, and related fields where LSA is used to provide a mathematical model for various cognitive functions.

The focus of this paper is on the proper implementation and use of LSA as a tool in real estate research. We rely on the application and results of prior studies to emphasize the importance of the method to the discipline. With researchers in real estate journals utilizing LSA as a research method, a primer on the use of LSA is due and is our contribution to the literature (Winson-Geideman and Evangelopoulos, 2013a, 2013b, 2013c).

The remainder of the paper is as follows: we begin with the background of LSA and then proceed to a discussion of the technical aspects of the method. Our paper concludes with three applications in real estate and an introduction to LSA software.


Latent semantic analysis was introduced in 1990 in an article in the Journal of the American Society for Information Science (Deerwester et al., 1990). Patented in 1988, it was described as ''a new method for automatic indexing and retrieval'' that takes ''advantage of implicit higher-order structure in the association of terms with documents ('semantic structure') in order to improve the detection of relevant documents on the basis of terms found in the queries.'' In layman's terms, LSA is a way to extract meaning from large bodies of text by identifying patterns in the use of terms within the documents where they appear. It is based on the premise that words close in meaning will appear in similar text and that themes or topics can be derived from the text.

The psychologists who pioneered LSA theorized that it mathematically describes the cognitive functions of the human mind (Landauer, 2007). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.