Academic journal article Educational Technology & Society

The Beast of Aggregating Cognitive Load Measures in Technology-Based Learning

Academic journal article Educational Technology & Society

The Beast of Aggregating Cognitive Load Measures in Technology-Based Learning

Article excerpt


An increasing part of cognitive load research in technology-based learning, and cognitive load research in general, includes a component of repeated measurements. Then, participants--frequently students, employees, managers or clients--and in statistical language more commonly referred to as subjects are measured repeatedly (i.e., at least two times) on the same variable(s) of interest. Be it performance, mental effort invested in a cognitive activity or some other variable, repeated measurements data have one nice feature: they enable researchers to separate so-called within-subjects variance from between-subjects variance. When we have a sample of individual students perform an exam only once, we can distinguish between participants, for some students perform better than others do. In such a context, however, we cannot account for variation within participants, since we have only one measurement per student. In educational research, we are usually not interested exclusively in distinguishing between participants; we are interested in within-participants changes related to learning and development as well, and for the latter we need studies that include a component of repeated measurements.

Recent calls for repeated measurements of cognitive load

Recently, two series of well-designed randomized controlled experiments (Schmeck, Opfermann, Van Gog, Paas, & Leutner, 2015; Van Gog, Kirschner, Kester, & Paas, 2012) provided evidence for the statement that studies where learners have to perform a series of tasks, it is better to measure a characteristic of interest--for instance mental effort invested in a cognitive activity (Paas, 1992)--after each task (i.e., repeatedly) than once respectively. A main finding of both series of experiments was that single retrospective (i.e., delayed) ratings were significantly higher than the average of ratings obtained after each task. This probably happens because delayed ratings are mainly influenced by the relatively more complex problems (Schmeck, et al., 2015).

In another experiment (Leppink, Paas, Van der Vleuten, Van Gog, & Van Merrienboer, 2013), students solved problems on conditional and joint probabilities in two modes: in an explanation of six lines of text, and in formula notation. In one condition, students first studied the textual explanation and then the formula explanation, while in the other condition the order was reversed. Various cognitive load measures were included in this experiment (Ayres, 2006; Cierniak, Scheiter, & Gerjets, 2009; Leppink, et al., 2013; Paas, 1992; Salomon, 1984). However, instead of administering these measures at the end of the full order of two formats, the measures were administered after each format. This enabled researchers to not only test for differences between formats but also test for format order effects. One finding that has implications for instructional design is that students who are confronted with the formula format before the textual explanation tend to experience higher extraneous cognitive load from the formula format than their peers who study exactly the same materials on joint and conditional probabilities in reversed order. The latter implication could not have resulted from aggregation of extraneous cognitive load scores across formats.

Repeated measurements data enable researchers to separate between-participants and within-participants variance. Unfortunately, many researchers have failed to appreciate this nice feature of repeated measurements data and have aggregated scores from repeated measurements to one sum or average score per participant (e.g., Ayres, 2006; Corbalan, Kester, & Van Merrienboer, 2008; Hoffman, 2012; Koriat, Nussinson, & Ackerman, 2014; Kostons, Van Gog, & Paas, 2012; Van Loon, De Bruin, Van Gog, & Van Merrienboer, 2013). Aggregating repeated measurements, we wipe out all within-participants variance and this can result in serious distortions of our views of effects and relations of interest (Leppink, 2015). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.