Collaborative Testing and Achievement: Are Two Heads Really Better Than One?
Haberyan, April, Barnett, Jerrold, Journal of Instructional Psychology
Two studies examined the impact of collaborative testing on exam scores for psychology students at a moderately selective Midwestern University. The first study was a replication of previous classroom research where students could choose to test with a partner or alone. No significant differences were found between those taking tests alone or with a partner. Students who scored high on extraversion and excitement seeking were more likely to choose collaborative testing. However, no significant differences were found between the students in each condition in anxiety, trust, and achievement striving. Following the classroom study, a laboratory study was conducted to tease apart the effects of studying with a partner and with testing with a partner. In Experiment 2, a strong testing effect was found, where students testing with a partner benefited, regardless of whether they studied with a partner or not.
Although individual work is stressed in schools, the ability to collaborate with others is deemed to be an important ability in many diverse aspects of life, such as business, team sports, and scientific research. While cooperative learning in school settings has been popular for some time, and has been the subject of extensive research, collaborative test taking is a relatively new phenomenon. Recent studies have found that collaborative test taking improves exam performance and promotes positive student attitudes (Lambiotte, Dansereau, Rocklin, Fletcher, Hythecker, Larson, et al., 1987; Lusk & Conklin, 2003; Mitchell & Melton, 2003; Slusser & Erickson, 2006; Zimbardo, Butler, & Wolfe, 2003). In addition, some researchers maintain that collaborative testing encourages positive problem-solving, improves long-term retention of information, and reduces anxiety (Cassini, 1994; Helmericks, 1993; Mitchell & Melton; Zimbardo et al.). One purpose of the present studies was to replicate these collaborative testing effects with a different sample and in different subject areas.
A second issue addressed in this paper is the role of personality traits and collaborative testing. Traits such as extraversion/introversion may play a major role in student preference for testing alone or with a partner and in determining whether they benefit from their collaboration. To our knowledge, no such studies have been conducted to date. However, a recent critique of the cooperative education literature may prove helpful (Genovese, 2005). Commenting on why so many educational innovations (including cooperative education) cycle in and out of favor with the education community, Genovese argued that the lack of attention to individual differences dooms any innovation that promises universal success. For example, high achieving students often prefer to work alone rather than with a partner (Cano-Garcia & Hughes, 2000). Hutchinson and Gul (1997) found that the trait of introversion/extraversion played a role in student preferences for working alone or with a group, and Onwuegbuzie (2001) found that peer-oriented students fared better than students with other learning styles in a research methods course requiring cooperative learning. A second purpose of this research project was to test the impact of personality traits on collaborative testing.
While it seems intuitively obvious (at least to the students who participated in the first study) that "two heads are better than one" and that collaborative testing would improve grades, the group process literature is quite mixed (Kerr & Tindale, 2004). Many times, groups fail to live up to their potential. For example, productive groups tend to be cohesive, and collaborative testing appears to work best when students have the opportunity to learn about one another's competencies (Zimbardo et al, 2003). However, two students choosing to take a test together may not spend enough time together to realize the benefits that partners offer. Another pitfall of group performance that might apply to testing with a partner is social loafing, where one partner chooses to have a free ride and trust their partner to be prepared for the examination. A third possibility is that stressful conditions, such as time pressure and the relative importance of classroom test grades for college students, can lead to "groupthink" where the group focuses in on one idea and closes its collective mind to other possibilities. In short, the social psychology of groups does not predict that all, or even most, students will increase their exam performance by testing with a peer.
The purpose of this first experiment was to test the efficacy of collaborative testing. Specifically, the first goal was to replicate the promising research summarized above using a different population. The second goal was to extend this research by examining the relationships among personality traits and collaborative testing.
Design. Several instructors cooperated in this classroom study. While the general procedures were similar across the study, details varied from course to course and instructor to instructor. The general rules across all sections were (1) all participating instructors informed students about the option of collaborative test-taking at the beginning of the semester, (2) all test data was collected from regularly scheduled classroom examinations that counted towards students' final grades, (3) all students had the option of taking the test with a partner or in the traditional format, (4) students were allowed to choose their partners, and (5) during the tests, students working collaboratively remained in the regular classroom, sat with their partner, and were allowed to quietly discuss their answers. Only one answer sheet or test booklet was turned in, and both students received the same grade. This room was supervised by the regular instructor. The students working alone were moved to a smaller classroom and were supervised by a graduate assistant, following all the normal procedures of a typical college test.
Participants. Volunteers were recruited from two Psychology courses: 26 from a Personality course and one-hundred thirty-eight from 4 sections of Educational Psychology (taught by 2 different instructors). Of these students, about 75% were female, which is representative of Psychology and Education majors at this university. Most were sophomores, juniors, or seniors. The vast majority was white and had never engaged in any type of collaborative or group testing prior to this experiment.
Procedure and Materials. This study was completed in several phases. In the first phase, students signed informed consent forms, completed a demographic questionnaire, and took the NEO-PIR. The demographic survey asked students their year in school, major, gender, race, grade point average, and if they had ever participated in group or cooperative testing prior to this study. The NEO-PIR (Costa & McCrae, 1992) is a 240 item personality inventory that is designed to measure the Big Five Personality Factors, i.e. Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness, and the traits associated with each factor. The NEO-PI-R scores have been shown to be internally consistent and reliable over time.
The second phase of this research project varied slightly from section to section. In one section of Educational Psychology, students completed the first and final examinations in the traditional individual format, with the option of collaborative testing available on the second and third tests. In three sections of Educational Psychology, students were given the option of collaborative testing only on the third of 4 exams. In the Personality class, students had the option to test with a peer on only the fourth and final test.
In the final phase of this study, conducted at the end of the semester, students completed a follow-up survey about their level of test anxiety, motivation, and the strategies they used in preparing the examinations.
Does testing with a peer improve exam performance? The first analysis was a series of t-tests comparing the performance of those tested alone and those working with a partner. Due to the wide variability in testing procedures, these comparisons were conducted by instructor. In all comparisons, the independent variable was how they choose to take the test when they had the option. The dependent variable was teacher-assigned examination score.
The results from the three sections of Educational Psychology with one instructor are presented in Table 1. Significant differences were found only on Test 4, which is a test all students took as individuals. No significant differences were found on Test 3, the test where students were allowed to test collaboratively. It is interesting to note that students choosing to test with a partner scored higher than those testing alone, contrary to the findings of Cano-Garcia and Hughes (2000), who found that high achieving students prefer to work alone.
Data from the remaining section of Educational Psychology were complicated because students were allowed to work together on two tests (the second and third exams), and some students switched conditions between Tests 2 and 3. To create fair comparisons a series of t-tests were conducted. The first two compared students who tested alone with those who tested with a partner on Test 2. Scores from Tests 1 and 2 were the dependent measures. A second pair of t-tests were conducted, comparing those who tested alone with those who tested with a partner on Test 3 as the independent measure and scores on Tests 3 and 4 as the dependent measures. Descriptive statistics are presented in Table 2. As you can see there, no differences approached significance on any of the four tests.
In the Personality course, students were given the option of taking only their fourth test with a partner. Descriptive statistics are presented in Table 3. As with the previous sections, none of the t-tests revealed a significant difference.
One problem with the findings presented thus far is that the relatively small number of students (especially in the test alone group) limits statistical power. To overcome this problem, all data was converted to z-scores (to allow comparisons across diverse sections and tests). Students were divided into 2 groups based upon whether they took any test with a partner. There were no significant differences on the control tests where all students took tests in the traditional format, t(126) = .29, p = .78. There were also no significant differences when some students took their test collaboratively, t(126) = 1.58, p = .12.
Who chooses to test collaboratively with a peer? To determine who chooses to participate in collaborative testing, scores on the NEO and on several demographics were compared via independent t-tests. Of the Big Five, those choosing to work in pairs differed from those working alone only in Extraversion. Details can be found in Table 4. A second set of comparisons examined 5 subscales chosen a priori from the NEO that seemed to relate to choosing collaborative test-taking. The details for these comparisons can be found in Table 5. As you can see there, only Excitement Seeking was statistically significant. Gregariousness was marginally significant, and Anxiety, Trust, and Achievement Striving failed to reach significance. Finally, two demographic variables were tested to determine their influence on the test-taking decision. There was no significant difference in self-reported GPA between those choosing to test alone (M = 3.05, SD = .46) and those choosing to test with a partner (M= 3.08, SD = .49), t(155) = .33, p = .74. A Chi Square test revealed that gender played no role in the decision to take the test alone or with a peer (Chi square = .67, df = 2, p = .72).
All students who tested with a partner were surveyed about their experiences. These results can briefly be summarized as showing that students had a positive emotional experience, claiming to be more motivated and less anxious because of peer testing. Their estimates of how much they actually learned were positive, but less so than the emotional response. Finally, most students claimed they studied alone (63%), some studied as a pair (25%), and only 3% split the work and studied only half the material.
The primary outcome of this study was a failure to replicate the facilitative effects of collaborative testing upon classroom achievement. There are important differences between this study and previous studies; we focus here upon the Zimbardo et al. study (2003). First is the sample. The Zimbardo study was conducted at Stanford, an elite university, while our study was conducted at a moderately selective, regional university. The students may have differed in any number of ways, the most important of which are academic ability, competitiveness, and sophistication. This latter may be reflected in test preparation across the two studies. Many of the students in the Zimbardo et al. reported using sophisticated, cooperative study strategies. The students in the present study typically studied alone, then tried to take the test in pair. It is quite possible that the benefits of collaborative testing are in the preparation, not the taking of the test. A second major difference between the two studies is the tests taken by students. In four of the five sections examined, the tests were half multiple choice and half essay, compared to the Zimbardo et al study where the tests were all multiple choice. It is possible that test format plays a more role in the benefits of working with a peer.
One problem with the research on collaborative testing to date is that it has been conducted in classroom settings, making it difficult to tease apart factors that might determine why collaborative testing facilitates (or fails to facilitate) test performance. Students who take their opportunity for collaborative testing seriously will also prepare for the exam together. An alternative explanation for the collaborative testing effect is that the benefits of collaboration occur at the time of learning (see Johnson & Johnson, 1991 and O'Donnell, 2006 for reviews). The second study reported is a laboratory experiment designed to tease apart the influences of collaboration at the time of studying, testing, and the interaction between the two.
Participants. One hundred and four volunteers, enrolled in General Psychology courses at a Midwestern university, participated in this study. The sample was predominantly Caucasian, with roughly equal numbers of males and females.
Materials. An approximately 2,000 word text was taken from an Industrial Psychology textbook (Landy & Conte, 2004) and adapted for use in this study. The passage described violence in the workplace.
A test, consisted of 4 factual multiple choice items and 6 short essay questions, was constructed to assess comprehension of the passage. Three of the essays required recall of factual information ("list the factors involved in...") and three required students to apply principles ("explain why there has been more violence in post offices than there has been in libraries or bakeries"). The essays were scored by two trained assistants, with an inter-rater reliability of .90. Alpha for the entire test was .82.
Procedure. This study was conducted with approximately 25 to thirty students participating at any one time. Students were first given an overview of the experiment. They then signed consent forms. Half of the students were randomly assigned to study the text alone. These students were lead to another room by a graduate assistant and given brief instructions about studying. They were told that they had 20 minutes to study and that normally students could read the test in 10 minutes (as determined in the pilot study). They were told to take their time and study the material carefully, using any strategy they preferred.
The remaining students were randomly assigned to pairs. The pairs were told they were to study as a team. They were also told that they had 20 minutes to study and that the text could be read in about 10 minutes. They were encouraged to take turns quizzing their partner as a way to study. This suggestion was made because quizzing each other is a relatively simple, effective method for studying with a partner (O'Donnell, 2006) and ensured that the two would actually work together.
At the end of the study period, students were again randomly divided into groups. In the study alone condition, 16 students were sent to another room to take the test alone (study alone/test alone group). The remaining 34 in the study alone group were moved to a different room, randomly assigned to pairs, and told to take the test with their partner (the study alone/test with a partner group). These students were given no instructions on how to take the test, except that they were given only one copy of the test and told that they must turn in one set of answers for the pair. Thirty-four students participated in this condition, resulting in 17 tests for scoring and analysis.
In the study with a partner group, eighteen students were randomly assigned to take the test alone (study with a partner/ test alone condition). The remaining thirty-six students were assigned to stay with their partner and take the test as a pair (the study with a partner/test with a partner group), yielding an n of 18.
Using total scores as the dependent variable, there was a significant test effect, F(1, 65) = 6.29, p = .015. Those who took the test with a partner scored significantly higher (M = 9.63 , SD = 2.20) than those testing alone (M = 8.15, SD = 2.60). There was no main effect for studying, F(1, 65) = .09, p = .77, nor was there a studying by testing interaction, F(1, 65) = .39, p = .54. Descriptive statistics for these analyses are presented in Table 6.
A second set of analyses examined the three outcome measures separately. On the multiple choice portion of the test, there were no significant differences. On the factual essays, there was a significant test effect, F(1, 65) = 6.82, p = .01. Those who tested with a partner scored higher (M = 4.14, SD = 1.46) than those testing alone (M = 3.15, SD = 1.64). There was no effect for studying condition and no significant interaction. On the transfer essays, there were no significant differences.
In contrast to Experiment 1 (a classroom study), this laboratory study found that collaboration while testing improved performance. This testing effect was independent of study condition. In fact, while half of our collaborative group studied and tested with the same partner, half were testing in a pair with a partner randomly assigned just prior to taking the test.
Cooperative learning has a long history, both in the research laboratory and in the classroom (O'Donnell, 2006). Allowing students to work together during testing is a recent and less researched idea. The Zimbardo et al (2003) classroom study showed remarkable results: reduced anxiety, improved attitudes, and increased performance. The most important outcome of Experiment 1 was a failure to replicate the facilitative effects of collaborative testing upon classroom achievement reported in previous studies (Zimbardo et al., 2003). Experiment 2 was designed to tease apart some of variables that might contribute to these conflicting results. The study produced a strong testing effect, independent of any benefits that might derive from studying with a partner. This outcome is consistent with a recent qualitative study (Ewald, 2005) investigating what students actually do as they collaborate while taking a test.
There are important differences between the Zimbardo et al (2003) study and the classroom experiment reported here. First is the sample. The Zimbardo study was conducted at Stanford, an elite university, while our study was conducted at a regional state university. The students may have differed in any number of ways, including academic ability, competitiveness, and sophistication. This latter may be reflected in test preparation across the two studies. Many of the students in the Zimbardo et al. reported using sophisticated, cooperative study strategies. The students in the Zimbardo et al. study spent considerable time working together prior to the test, thus gaining the benefits of working with a partner while studying as well as gaining their benefit during the test. The students in the present study typically studied alone, then tried to take the test in pairs. Although our laboratory study found a separate testing effect, in realistic situations, working together prior to the test is highly recommended.
The current study also differed from Zimbardo et al., (2003), in the measure of achievement. The tests in the Zimbardo et al. study were all multiple choice whereas most of the sections in the present study used a combination of multiple choice and essay.
This might be an important variable to test in future studies. In the business world there is a greater emphasis upon project completion, so future studies will need to examine this outcome as well.
A problem with several of the collaborative testing studies is that students are allowed to choose whether they worked alone or with a partner, creating problems with the internal validity of the studies (the two groups may not be equal to begin with). While students benefited if they worked with a partner, it is not at all clear that students choosing not to work would benefit if they were required to do so. Our laboratory study allowed us to randomly assign students to experimental condition and bypass student preferences.
Future research should consider the social dynamics between partners during the test. Laughlin et al (2003) found that group performance was equal to the level of the group's most capable member. A lack of group cohesion (Everett, 1992), the potential cost of effortful task performance such as fatigue (Anshel, 1995), and no anticipated punishment for poor performance (Miles & Greenberg, 1993) could result in greater levels of social loafing. Social loafing would result in poor group performance.
Another issue, related to social loafing, that should be considered in future research is accountability. Studies of cooperative learning have consistently found that students must be graded individually and as a group (Slavin, 1985). In most of the collaborative testing research thus far, only group grades have been assigned. Collaborative testing effects might be strengthened if individual contributions were rewarded, along with the group performance.
With regards to the personality variables, the finding that extraverts were more likely to choose collaborative testing is hardly surprising. In general, extraverts are more social, assertive, optimistic, and talkative. It would appear that extraverts can use the collaborative testing situation to their advantage. Furthermore, the significant relationship with Excitement Seeking, which is a facet of Extraversion, might be attributed to the students' perception that collaborative testing was a riskier option. In other words, there was an element of novelty and risk that might have appealed to those high in Excitement Seeking. Thus, it would appear that providing students with a testing situation that allows them to engage in active problem-solving with another student and has an element of risk would be a more appealing prospect for extraverted students.
There are several limitations to the study which need to be addressed. First, Experiment 2 was conducted in a laboratory setting. While the novelty of the situation and the fact that test performance had no real consequences may have lead to facilitative effects, participants may behave differently when asked to collaborate on actual classroom tests. The relatively small, homogeneous sample for the laboratory experiment is a limitation as well. Future studies should use a more diverse group of subjects. Sampling was also an issue in our classroom study. Most of the students were from an Educational Psychology course. This course requires students to consider different teaching, learning, and testing strategies. The results might be different for students from different disciplines.
Cano-Garcia, F., & Hughes, E.H. (2000). Learning and thinking styles: An analysis of their interrelationship and influence on academic achievement. Educational Psychology, 20, 413-430.
Cassini, C. (1994). Collaborative testing, grading. The Teaching Professor, 8 (4), 4.
Costa, P., & McCrae, R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI): Professional Manual. Odessa, FL: Psychological Assessment Resources.
Ewald, J.D. (2005). Langauge-related episodes in an assessment context: A 'small-group quiz.' The Canadian Modern Language Review, 61,565-586.
Genovese, J.E.C. (2005). Why educational innovations fail: An individual difference perspective. Social Behavior and Personality, 33,569-578.
Helmerieks, S.G. (1993). Collaborative testing in social statistics: Toward Gemein-stat. Teaching Sociology, 21,287-297.
Hutchins on, D., & Gul, EA. (1997). The interactive effects of extroversion/introversion traits and collectivism/individualism cultural beliefs on student group learning preferences. Journal of Accounting Education, 15, 97-107.
Johnson, D. W., & Johnson, R.T. (1991). Learning together and alone: Cooperative, competitive, and individualistic learning. Englewood Cliffs, NJ: Prentice Hall.
Kerr, N. L., & Tindale, R. S. (2004). Group performance and decision making. In S.T. Fiske, D.L. Schacter, & C. Zahn-Waxler (Eds.), Annual Review of Psychology (Vol. 55). (pp. 623-656) El Camino, CA: Annual Reviews.
King, A. R. (1998). Relations between MCMI-II personality variables and measures of academic performance. Journal of Personality Assessment, 71(2), 253-268.
Lambiotte, J. G., Dansereau, D. F., Rocklin, T. R., Fletcher, B., Hythecker, V.I., Larson, C.O., et al. (1987). Cooperative learning and test-taking: Transfer of skills. Contemporary Educational Psychology, 12, 52-61.
Landy, F. J., & Conte, J. M. (2004). Work in the 21st Century. New York, N.Y.: McGraw Hill.
Lusk, M., & Conklin, L. (2003). Collaborative Testing to Promote Learning. Journal of Nursing Education, 42 (3), 121 - 124.
Mitchell, N., & Melton, S. (2003). Collaborative Testing: An Innovative Approach to Test Taking. Nurse Educator, 28(2), 95-97.
O'Donnell, A.M. (2006). The role of peers and group learning. In P.A. Alexander & P.H. Winne (Eds.), Handbook of Educational Psychology (pp. 781-802), (2nd Ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Onwuegbuzie,A.J. (2001). Relationship between peer orientation and achievement in cooperative learning-based research methodology courses. Journal of Educational Research, 94, 164-169.
Paunonen, S. V. (2001). Big Five predictors of academic achievement. Journal of Research Personality, 35(1), 78-90.
Slavin, R.E. (1990). Cooperative Learning: Theory, Research, and Practice. New York, NY: Longman.
Slusser, S.R., & Erickson, R.J. (2006). Group quizzes: An extension of the collaborative learning process. Teaching Sociology, 34, 1-14.
Zimbardo, P. G., Butler, L. D., & Wolfe, V. A. (2003). Cooperative College Examinations: More Gain, Less Pain When Students Share Information and Grades. The Journal of Experimental Education, 2003, 71(2), 101-125.
April Haberyan, Assistant Professor and Jerrold Barnett, Full Professor, Department of Psychology, Sociology and Counseling, Northwest Missouri State University.
Correspondence concerning this article should be addressed to April Haberyan at email@example.com.
Table 1 Descriptive Statistics from Three Sections of Educational Psychology * Testing Alone (n = 25)Testing with Partner (n = 37) Test M SD M SD t P 1 41.60 5.45 40.68 5.71 .56 .58 2 43.48 4.52 44.08 4.22 .61 .55 3 ** 43.00 5.75 44.67 3.13 1.48 .14 4 41.60 4.49 43.66 3.94 2.20 .03 Note. * Each test had fifty points possible. ** The third test is the only test these students were allowed to work with a partner. The groups were formed around how students elected to take Test 3. The mean for the testing with a partner group is based upon the results from 37 pairs. Table 2 Descriptive Statistics from An Educational Psychology Section * Testing Alone Testing with Partner Test n M SD n M SD t P Test 1 10 68.90 14.38 25 68.00 16.91 .15 .88 Test 2 ** 10 77.90 8.67 13 82.38 9.31 1.18 .25 Test 3 ** 9 76.00 8.54 13 79.15 9.60 .79 .44 Test 4 9 79.78 11.88 26 74.42 13.52 1.05 .30 Notes. * All scores are percentages. ** Students were allowed to test with a partner on Tests 2 and 3. Tests 1 and 4 were taken individually. Table 3 Descriptive Statistics from Psychology of Personality * Testing Alone Testing with Partner Test n M SD n M SD t P Test 1 5 82.80 13.27 21 73.95 10.92 1.57 .13 Test 2 5 86.20 9.60 21 80.52 9.56 1.19 .25 Test 3 5 80.80 15.16 21 78.95 12.64 .28 .78 Test 4 ** 5 87.00 8.78 10 81.70 12.98 .82 .43 Notes. * All scores are percentages. ** Students were allowed to test with a partner only on Test 4. Table 4 Comparisons on the NEO of those Choosing to Participate in Collaborative Testing Test alone Test with peer M SD M SD t P Neuroticism 101.67 41.47 98.26 21.99 .47 .64 Extraversion 121.36 17.01 129.02 25.58 2.10 .04 Openness to 116.83 19.01 117.67 30.79 .19 .84 Experience Agreeableness 124.31 12.71 128.37 72.31 .59 .55 Consciousness 115.47 21.02 112.68 19.58 .71 .48 Table 5 Comparisons on Five subscales from the NEO of those Choosing to Participate in Collaborative Testing Test alone Test with peer M SD M SD t P Anxiety 19.25 4.78 18.98 4.85 .29 .11 Gregariousness 18.92 5.32 22.20 18.78 1.72 .09 Excitement Seeking 20.03 4.18 22.15 4.60 2.62 .01 Trust 20.19 3.58 19.95 4.80 .33 .74 Achievement striving 18.67 4.92 18.94 4.58 .30 .76 Table 6 Means and Standard Deviations Across Three Types of Test Questions in Experiment 2 Test Alone Test with a Partner M SD M SD Study Alone MC 3.56 .63 3.41 .62 Factual essays 3.38 1.59 4.18 1.33 Transfer essays 1.31 1.01 1.76 .97 Total 8.25 2.72 9.35 2.15 Study with a Partner MC 3.39 .78 3.67 .49 Factual essays 2.94 1.70 4.11 1.60 Transfer essays 1.72 1.32 2.11 1.18 Total 8.05 2.55 9.89 2.27…
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Collaborative Testing and Achievement: Are Two Heads Really Better Than One?. Contributors: Haberyan, April - Author, Barnett, Jerrold - Author. Journal title: Journal of Instructional Psychology. Volume: 37. Issue: 1 Publication date: March 2010. Page number: 32+. © 2009 George Uhlig Publisher. COPYRIGHT 2010 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.