The current method of hypothesis testing in the social sciences is under intense criticism, yet most political scientists are unaware of the important issues being raised. Criticisms focus on the construction and interpretation of a procedure that has dominated the reporting of empirical results for over fifty years. There is evidence that null hypothesis significance testing as practiced in political science is deeply flawed and widely misunderstood. This is important since most empirical work argues the value of findings through the use of the null hypothesis significance test. In this article I review the history of the null hypothesis significance testing paradigm in the social sciences and discuss major problems, some of which are logical inconsistencies while others are more interpretive in nature. I suggest alternative techniques to convey effectively the importance of data-analytic findings. These recommendations are illustrated with examples using empirical political science publications.
The primary means of conveying the strength of empirical findings in political science is the null hypothesis significance test, yet we have generally failed to notice that this paradigm is under intense criticism in other disciplines. Led in the social sciences by psychology, many are challenging the basic tenets of the way that nearly all social scientists are trained to develop and test empirical hypotheses. It has been described as a "strangle-hold" (Rozenboom 1960), "deeply flawed or else ill-used by researchers" (Serlin and Lapsley 1993), "a terrible mistake, basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology" (Meehl 1978), "an instance of the kind of essential mindlessness in the conduct of research" (Bakan 1960), "badly misused for a long time" (Cohen 1994), and that it has "systematically retarded the growth of cumulative knowledge" (Schmidt 1996). Or even more bluntly: "The significance test as it is currently used in the social sciences just does not work" (Hunter 1997).
Statisticians have long been aware of the limitations of null hypothesis significance testing as currently practiced in political science research. Jeffreys (1961) observed that using p-values as decision criteria is backward in its reasoning: "a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred." Another common criticism notes that this interpretation of hypothesis testing confuses inference and decision making since it "does not allow for the costs of possible wrong actions to be taken into account in any precise way" (Barnett 1973). The perspective of many statisticians toward null hypothesis significance testing is typified by the statement: "a P-value of 0.05 essentially does not provide any evidence against the null hypothesis (Berger, Boukai, and Wang 1997), and the observation that the null versus research hypothesis is really an "artificial dichotomy" (Gelman et al. 1995). Berger and Sellke (1987) show that evidence against the null given by correctly interpreting the postehor distribution or corresponding likelihood function "can differ by an order of magnitude."
Political methodology has reemerged as an active and vibrant subfield developing important new methods and perspectives. However, the focus has not been on reviewing or reevaluating foundational issues, and there has been no discussion of the validity of the null hypothesis significance test as currently practiced in political science despite its pervasiveness. Noting this absence over quite some years, I discuss the history of the current hypothesis-testing paradigm, give evidence of some very serious misinterpretations of the approach, and present alternative procedures and interpretations.
Since political science is an overwhelmingly empirical discipline, the interpretation of these empirical results affect the interpretation of substantive conclusions. …