Stimulus Set Size and Statistical Coverage of the Grammar in Artificial Grammar Learning
Poletiek, Fenna H., van Schijndel, Tessa J. P., Psychonomic Bulletin & Review
Adults and children acquire knowledge of the structure of their environment on the basis of repeated exposure to samples of structured stimuli. In the study of inductive learning, a straightforward issue is how much sample information is needed to learn the structure. The present study distinguishes between two measures for the amount of information in the sample: set size and the extent to which the set of exemplars statistically covers the underlying structure. In an artificial grammar learning experiment, learning was affected by the sample's statistical coverage of the grammar, but not by its mere size. Our result suggests an alternative explanation of the set size effects on learning found in previous studies (McAndrews & Moscovitch, 1985; Meulemans & Van der Linden, 1997), because, as we argue, set size was confounded with statistical coverage in these studies.
Learning the complex rules of language from linguistic stimuli is a striking example of inductive learning. Some researchers have emphasized the role of an innate human predisposition in natural grammar acquisition, with the environment playing a minor role (Chomsky, 1980; Pinker, 1994). Indeed, a major problem for cognitive learning explanations of grammar induction is that the stimulus sample contains too few exemplars to explain complete learning. However, a growing number of recent studies have suggested that structure induction relies, at least partly, on sensitivity to statistical properties of the structured environment (Chater & Manning, 2006; Kuhl, 2004; Poletiek, 2006; Poletiek & Chater, 2006; Poletiek & Wolters, 2009; Redington, Chater, & Finch, 1998). Here, we explore the hypothesis that statistical sample properties may be important for learning and, hence, compensate for limitations in number.
Although the problem of learning a structure with limited input is a core issue in linguistic theory and natural language acquisition research (Gold, 1967; Marcus, 1993), it has received little attention in statistical-learning accounts and experimental paradigms of grammar induction (Jamieson & Mewhort, 2005). For example, only a few studies with the artificial grammar learning (AGL) paradigm, the experimental paradigm most often used to study implicit sequential structure learning, have addressed the question of how the number of exemplars in an input affects learning (McAndrews & Moscovitch, 1985; Meulemans & Van der Linden, 1997). Obviously, the most straightforward way to interpret the limited input problem is to look at the input sample's size. However, besides discrete sample information, statistical sample information about the grammar may be relevant to a learner as well. The purpose of the present study is to explore the influence of the amount of sample information about an underlying grammar-both discrete and statistical-on learning in an AGL task. We first will discuss previous experimental research on the effects of discrete input information. Next, we will introduce statistical coverage (SC) as a new statistical measure for the amount of sample information. Finally, we will report the effect of both measures when tested in an experiment.
In the AGL paradigm, participants are exposed to a number of exemplars (e.g., strings of letters) of a finite state structure during a training phase, without being informed about the structure. Next, in the test phase, the participants categorize new strings, half of which are grammatical and the other half ungrammatical. Performance in the test phase (proportion of correct categorizations) is taken as an indication of how much has been learned about the grammar at training. Moreover, response analyses can show whether categorizations were based on the actual grammaticality of the test items, fragment recognition (i.e., bigrams or trigrams), or memory for full exemplars (see Pothos, 2007, for a review).
McAndrews and Moscovitch (1985) used the AGL paradigm to investigate whether responses were based on memory for exemplars or knowledge of the grammar, after training with small and large input sets. …