More Powerful Tests from Confidence Interval P Values

Journal article by Roger L. Berger; The American Statistician, Vol. 50, 1996

Journal Article Excerpt


More powerful tests from confidence interval p values.

by Roger L. Berger

1. INTRODUCTION

The problem of comparing two binomial proportions has been considered for many years. The most commonly used test is Fisher's Exact Test (Fisher 1935), a conditional test. Barnard (1945, 1947) proposed an unconditional test for this problem. Although unconditional tests are usually more powerful than conditional tests, they are computationally much more complex. But recent advances in computing have made unconditional tests practical, and they are beginning to appear in statistical software packages such as StatXact 3 for Windows. In this article it is shown that unconditional tests based on the confidence interval p value of Berger and Boos (1994) are often uniformly more powerful than the standard unconditional tests.

Let X and Y be independent binomial random variables. The sample size for X is m and the success probability is [p.sub.1]. The sample size for Y is n and the success probability is [p.sub.2]. The binomial probability mass function of X will be denoted by

[Mathematical Expression Omitted].

Similarly, b(y;n, [p.sub.2]) will denote the binomial probability mass function of Y. The sample space of (X, Y) will be denoted by X = {0, . . ., m} x {0, . . ., n}. X contains (m + 1)(n + 1) points.

This kind of data is often displayed in a 2 x 2 contingency table as follows:

 
                  yes        no                                      

Population 1 X m - X m
Population 2 Y n - Y n

R = X + Y t - R t = m + n.
 In this table uppercase letters denote random variables and lowercase letters denote known constants fixed by the sampling scheme. Hence t is the total sample size and R is the observed number of successes. Conditional inference is based on the conditional distribution of X and Y, given the observed marginal R = r = x + y. Consider the problem of testing

[H.sub.0]: [p.sub.1] = [p.sub.2] versus [H.sub.a]: [p.sub.1] [less than] [p.sub.2]. (1)

Exact tests for this problem will be considered. The sizes of the tests are computed using the exact binomial distributions, not normal or chi-squared approximations. The standard Neyman-Pearson paradigm of restricting consideration to level-[Alpha] tests and then comparing the powers of these tests will be followed. For a specified error probability [Alpha] all tests considered are level-[Alpha] tests. Tests that are liberal, that sometimes have type-I error probabilities that are greater than [Alpha], are not considered. However, the tests do not have sizes exactly equal to the specified [Alpha]. Because of the discrete nature of these data, equality can (usually) be achieved only with a randomized test. Because randomized tests are not of any practical interest, this paper considers only nonrandomized tests.

The analysis in this article is unconditional. That is, the size and power comparisons are based on the binomial distributions of the model. There is continuing debate as to whether conditional or unconditional calculations are more relevant for these problems. Little (1989) and Greenland (1991) provided good recent summaries of the issues in this debate. The purpose of this paper is not to continue this debate. Rather, suffice it to say that this paper is relevant to those situations in which the unconditional analysis is appropriate.

2. USUAL UNCONDITIONAL TEST

Barnard (1945, 1947) first proposed an unconditional test for this problem. Because of the computational difficulty of unconditional tests, they were not widely used until recently. Now, computing technology makes the use of unconditional tests feasible.

A commonly used unconditional test is the Z test proposed by Suissa and Shuster (1985) and Haber (1986). Define the Z-pooled statistic (score statistic) as

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted], [Mathematical Expression Omitted], and [Mathematical Expression Omitted], the pooled estimate of [p.sub.1] = [p.sub.2] = p under [H.sub.0]. Then the p value for testing (1), using the test statistic Z, is

[Mathematical Expression Omitted], (2)

where [R.sub.Z](x,y) = {(a,b): (a,b) [element of] X and Z(a,b) [greater than or equal to] Z(x, y)}. The p value is the maximum probability under [H.sub.0] of observing a value of the test statistic equal to or more extreme than the value observed in the data. This is a standard definition of a p value, such as is found in Bickel and Doksum (1977, Sec. 5.2.B). Rejection of [H.sub.0] if and only if [p.sub.Z] [less than or equal to] [Alpha] defines a level-[Alpha] test of (1). The calculation of the supremum in (2) must be done numerically. Typically there ...

End of free preview...

 To continue reading this publication, you must have a Questia Subscription.

Try Us Today! Click Here

Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.

Already a subscriber? Login:

Sponsored Links
Read more than 5,000 classic books FREE!
Free Newsletter
Get helpful how-to's, writing tips, search strategies, quizzes & more!
Search the Library

Customize your search: Search within the topic


Search in:
Books Journals Magazines
Newspapers Encyclopedia Research Topics
  • Type your specific word or phrase in the box above after the word and, then click Search.
  • Put exact phrases in double quotation marks. Do not put single words in quotation marks.
Back to top