Academic journal article British and American Studies

Exploring Distributional Semantics with the Spacexplorer Tool

Academic journal article British and American Studies

Exploring Distributional Semantics with the Spacexplorer Tool

Article excerpt

1. Introduction

Distributional Semantics is a computational, practice-oriented, data-driven approach to representing meaning. Co-occurrence statistical information supplies empirical evidence about a word's general potential for replacing another word, which gives us the opportunity of measuring word similarity. According to the distributional hypothesis, this similarity is a semantic phenomenon.

Distributional Semantics is a very active research program in Cognitive Science. It is based on a structuralist view on meaning (with roots that can be traced back to Saussure and Harris, cf. Sahlgren 2008): Distributional Semantics focuses on what is internal to language and assumes that other aspects of meaning (e.g. reference) will also be reflected by language-internal phenomena or remain irrelevant for description. Approximating the meaning of words is carried out by assessing distributional properties as manifested in corpora.

A geometric procedure is commonly employed in Distributional Semantics to represent and compare meanings. Co-occurrence events between words are usually collected as numerical features in feature vectors that stand for words in a vector space. Meaning differences and similarities can then be conveniently represented and calculated in this vector space by working with the feature vectors. More details about this process will be provided in section 2.

As shown above, Distributional Semantics is bound with strong ties to Linguistics and Geometry. Computational linguists have also found the distributional methodology an efficient yet powerful way of acquiring semantic information about words. As far as language technology is concerned, some of the first vector-space applications included the task of finding relevant documents in Information Retrieval (Saltón 1971). Question answering (e.g. Tellex et al. 2003) and document clustering (e.g. Manning et al. 2008) may be implemented in a similar way. Comparable systems have been developed for word sense disambiguation (Schütze 1998), thesaurus generation through automatized discovery and clustering of word senses (Crouch 1988, Pantel and Lin 2002) and named-entity recognition (Vyas and Pantel 2009). Pennacchiotti et al. (2008) use Distributional Semantics in a cognitive semantic context: they propose a method for extending FrameNet's scope by covering more (potentially: frame-evoking) lexical items through distributional lexical unit induction.

Psycholinguistics also has a major role it Distributional Semantics as corpusderived and psycholinguistic data correlate (gained from human similarity judgements, cf. e.g. Miller and Charles 1991, and from semantic priming experiments, e.g. Pado and Lapata 2007).

Distributional Semantics is a powerful model that has been used in many scientific disciplines, but it has an empirical side that can only be researched with proper tools that can process large corpora and find co-occurrence events between words.

2. How to build a Vector Space Model (VSM)?

Systems designed to collect distributional information about words rely on a geometrical interpretation of the empirical data (Widdows 2004). Each target word is represented in a multi-dimensional space by a feature vector. Each position of the feature vector signals or counts the number of co-occurrences of the given target word with one of the context words we use for describing target items. For example, if the word drink is a target word, the word tea is among the context words and tea occurs 23 times in the close vicinity (in the "context window") of drink, then the vector element corresponding to the word tea (in the context vector describing the word drink) will be set to 23:

(ProQuest: ... denotes formula omitted.)

where v is a feature vector that represents a target word in a /-dimensional space; / stands for the total number of context words.

Large corpora (20-50-100 million words or even more) are necessary for this type of investigation. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.