Academic journal article Journal of Counseling and Development : JCD

"Statistical," "Practical," and Clinical": How Many Kinds of Significance Do Counselors Need to Consider? (Research)

Academic journal article Journal of Counseling and Development : JCD

"Statistical," "Practical," and Clinical": How Many Kinds of Significance Do Counselors Need to Consider? (Research)

Article excerpt

Statistical significance tests have a long history dating back at least to the 1700s. In 1710 a Scottish physician, John Arbuthnot, published his statistical analysis of 82 years of London birth rates as regards gender (Hacking, 1965). Similar applications emerged sporadically over the course of the next two centuries.

But statistical testing did not become ubiquitous until the early 1900s. In 1900, Karl Pearson developed the chi-square goodness-of-fit test. In 1908, William S. Gossett published his t test under the pseudonym "Student" because of the employment restrictions of the Dublin-based Guinness brewery in which he worked.

In 1918, Ronald Fisher first articulated the analysis of variance (ANOVA) logic. Snedecor (1934) subsequently proposed an ANOVA test statistic, that he named "F" in honor of Fisher, who of course subsequently became "Sir" Ronald Fisher. But it was with the 1925 first publication of Fisher's book Statistical Methods for Research Workers and the 1935 publication of his book The Design of Experiments that statistical testing was really popularized.

Huberty (1993; Huberty & Pike, 1999) provided authoritative details on this history. However, it is noteworthy that criticisms of statistical testing are virtually as old as the method itself (cf. Berkson, 1938). For example, in his critique of the mindless use of statistical tests titled "Mathematical vs. Scientific Significance," Boring (1919) argued some 80 years ago,

   The case is one of many where statistical ability, divorced from a
   scientific intimacy with the fundamental observations, leads nowhere. (p.

Statistical tests have been subjected to both intermittent (e.g., Carver, 1978; Meehl, 1978) and contemporary criticisms (cf. Cohen, 1994; Schmidt, 1996). For example, Tryon (1998) recently lamented,

   [T]he fact that statistical experts and investigators publishing in the
   best journals cannot consistently interpret the results of these analyses
   is extremely disturbing. Seventy-two years of education have resulted in
   minuscule, if any, progress toward correcting this situation. It is
   difficult to estimate the handicap that widespread, incorrect, and
   intractable use of a primary data analytic method has on a scientific
   discipline, but the deleterious effects are doubtless substantial. (p. 796)

Anderson, Burnham, and Thompson (2000) provided a chart summarizing the frequencies of publications of such criticisms across both decades and diverse disciplines.

Such criticism has stimulated defenders to articulate views that are also thoughtful. Noteworthy examples include Abelson (1997), Cortina and Dunlap (1997), and Frick (1996). The most balanced and comprehensive treatment of diverse perspectives is provided by Harlow, Mulaik, and Steiger (1997; for reviews of this book, see Levin, 1998; Thompson, 1998).


The purpose of the present review is not to argue whether statistical significance tests should be banned (cf. Schmidt & Hunter, 1997) or not banned (cf. Abelson, 1997). These various views have been repeatedly presented in the literature.

Instead, this article has three purposes. First, the article seeks to clarify the distinction between three "kinds" of significance: "statistical," "practical," and "clinical." Second, various indices of practical and clinical significance are briefly reviewed. Finally, it is argued that counselors should not consider only statistical significance when conducting inquiries or evaluating research reports.

Practical or clinical significance, or both, will usually be relevant in most counseling research projects and should be explicitly and directly addressed. Authors should always report one or more of the indices of "practical" or "clinical" significance, or both. Readers should expect them. And it is argued in this article that editors should require them. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.