Special Considerations When Using Statistical Analysis in Engineering Education Assessment and Evaluation
Larpkiataworn, Siripen, Muogboh, Obinna, Besterfield-Sacre, Mary, Shuman, Larry J., Wolfe, Harvey, Journal of Engineering Education
Two special considerations are discussed which frequently arise when conducting statistical analyses for the assessment and evaluation of engineering education programs. The first concerns the multiple comparison problem and Type I errors, specifically when should the significance level be adjusted and which adjustment procedure is most appropriate? Three scenarios are presented to illustrate three different applications of the classical Bonferroni procedure, one of the most extensively used adjustment procedures. A scenario is also presented for when an adjustment is not necessary. The second consideration is: when evaluating a predictive model should a tree diagram be used as an alternative to a classification table? For example, how does one assess a model's predictions when certain of its "recommendations" are not followed? For this type of case, a classification table may yield incomplete information. The use of a tree diagram to present more information on model performance is discussed.
Engineering education has witnessed an explosion of noteworthy research during the past decade spurred on by two converging initiatives. The first is the current evolution of curricula in response to the changes in the engineering accreditation criteria by the Accreditation Board for Engineering and Technology (ABET). The second is the National Science Foundation's (NSF) bold move to fund the Engineering Education Coalitions in concert with a series of other NSF educational initiatives that have generated a number of the ideas currently being implemented in response to the new ABET criteria. In addition, the former has highlighted the critical need for assessment within the engineering education community, while the latter has triggered a call for wide-scale dissemination and implementation of many of these NSF sponsored educational innovations. Indeed, research in engineering education is becoming a "respected" endeavor as more educators and administrators are using such results to make educational and policy improvements.
That said, as educators turned to this type of research they must also become more cognizant of the statistical ramifications of their results. This paper presents two special considerations that frequently arise when statistical analyses are employed as part of the assessment and evaluation of engineering education programs and courses. It has been our experience that researchers often overlook these considerations, and consequently may draw inaccurate conclusions at worst or faulty analyses at best. Although these two issues are separate in nature, they both deal with reaching potentially erroneous conclusions with regards to Type I and II errors.
The first is associated with the multiple comparison problem, specifically adjusting the Type I error level (i.e., the probability of rejecting the null hypothesis, when, in fact, it is true) and determining which adjustment procedure is appropriate for a particular multiple comparison situation. That is, when should the significance level (a) be changed to a more stringent family-wise Type I error when multiple comparisons are conducted? Certain procedures such as classical Bonferroni multiple comparison , Scheffe's method , and Duncan's multiple range test  have been applied to adjust the family-wise Type I error to make it smaller or more conservative. However, such adjustments are not without controversy as statisticians have debated whether these procedures are appropriate [4, 5]. Of particular concern is the question: if the adjustment is/is not applied, will one misinterpret the data? If multiple comparisons are to be made, would adjusting the level of significance be so small that truly significant data will be disregarded? Will the Type II error (i.e., the probability of accepting the null hypothesis, when, in fact, it is false) be large if multiple comparisons are conducted? As noted, statisticians have provided several methods for adjusting the Type I error in an experiment, but have provided little guidance as to when the classical Bonferroni multiple comparison procedure should or should not be used . …