P Values Maximized over a Confidence Set for the Nuisance Parameter. by Robert L. Berger , Dennis D. Boos 1. INTRODUCTION Testing problems are often complicated by the presence of a nuisance parameter vector [theta]. Consider first a model in which there is no nuisance parameter. Suppose that the data X have a probability distribution [P.sub.[nu]], defined in terms of a parameter [nu], and that we wish to test the simple hypothesis [H.sub.o: [nu] = [[nu].sub.o]. If the test statistic T is used to test [H.sub.o] and if large values of T give evidence against Ho, then for an observed value T = t, the p value is p = [P.sub.[nu]o] (T [greater than or equal to! t). Now consider a model with a nuisance parameter [theta]. The distribution of X has two parameters, [nu] and [theta]. We still wish to test [H.sub.o: [nu] = [[nu].sub.o], but this hypothesis is no longer simple, because the value of [theta] is unspecified. Using a test statistic as before, the p value is now p = [sup.sub.[theta]][P.sub.[nu]o,[delta]](T [greater than or equal to] t) (see, for example, Bickel and Doksum 1977, pp. 171-172). Unfortunately, the need to calculate the [sup.sub.[theta]] has complicated the problem. This complication is usually handled in one of three ways. First, in some problems it can be shown that for all values of t, the [sup.sub.[theta]] is always attained at a particular value [[theta].sub.o]. In this case the p value is simply p = [P.[nu]o,[delta]]o(T [greater than or equal to] t), and the parameter ([nu].sub.o] [[theta].sub.o]) is called the least favorable configuration. For example, in common one-sided testing problems, the boundary of the null hypothesis space is least favorable. A second way to handle the unknown [theta] is to choose judiciously a test statistic T (that usually depends on estimated values of [theta]) whose distribution under [H.sub.o] does not depend on [theta]. That is, T is ancillary under [H.sub.o]. Then [P.sub.[nu]o,[theta]](T [greater than or equal to] t) is the same for all [theta], so calculation of the sup,, is avoided. For example, in normal means problems we replace unknown variances with sample variances and use t or F distributions to account for the estimated variances. A third method to handle the unknown [theta] is to condition on the value of a statistic S that is sufficient for [theta] under [H.sub.o]. Then the conditional distribution of any statistic, given S, does not depend on [theta] (under [H.sub.o]), and the p value is taken to be p = [P.sub.[nu]o](T [greater than or equal to] t\S = s). For example, in a 2 X 2 contingency table with common "success" probability [theta] under [H.sub.o], one can condition on the marginals (a sufficient statistic for [theta] under [H.sub.o]) and use Fisher's exact test. All three methods replace the calculation of the sup, by the calculation of a single probability, and each method can result in a valid p value; that is, a statistic p such that under the null hypothesis, P(p [less than or equal to] [alpha]) [less than or equal to] [alpha], for each [alpha] [epsilon] [0, 1]. (1) We call a statistic that satisfies (1) a valid p value because it can be used in the standard way to define a level-[alpha] test. Consider the test that rejects the null hypothesis if and only if p [less than or equal to] [alpha]. Then under the null hypothesis, P(reject null) = P(p [less than or equal to] [alpha]) [less than or equal to! [alpha]; that is, the test so defined is a level-[alpha] test. In many situations, however, none of the three methods is satisfactory. For example, the value of [theta] at which the [sup.sub.[theta]] occurs may depend on the value t in a complicated way. Also, exact distributional results are often not available for statistics with estimated parameters. And, finally, it may not be possible to find an appropriate sufficient statistic on which to condition. In this article we consider a different approach for obtaining valid p values. Suppose that a valid p value p([[theta].sub.o]) may be calculated when the true value [[theta].sub.o] of the nuisance parameter vector [theta] is known. Here it should be noted that the calculation of p([[theta].sub.o]) does not have to be based on the same test statistic for different values of [[theta].sub.o]. Indeed the test statistic may depend directly on the assumed known value of [[theta].sub.o]. All that is needed is that for each value of [[theta].sub.o], p([[theta].sub.o]) must be a statistic that satisfies (1). If [[theta].sub.o] is not known, then a valid p value may be obtained by maximizing p([theta]) over the parameter space of [theta]; that is, [P.sub.sup] = [sup.sub.[theta]]p([theta]) clearly satisfies (1). The use of [p.sub.sup] has two potential difficulties, one computational and the other statistical. If the parameter space for [theta] is unbounded and if the [sup.sub.[theta]] is calculated numerically (as it often will be), then it may be uncertain whether the numerical method did indeed find the overall maximum. Of course, there is always uncertainty about the result of a numerical maximization, but this uncertainty is worse if the set being maximized over is unbounded. Statistically, it seems a waste of information in the data to take the sup over all values of [theta]. Having observed the data, we should be able to estimate [theta] and should not need to consider values of [theta] that are completely unsupported by the data. Storer and Kim (1990) and others have used this idea to propose as ap value p([theta]), where [theta] is an estimate of [theta] (usually the maximum likelihood estimate). But p values defined in this way may not be valid; see the computations of Storer and Kim (1990). A valid p value that addresses both of the aforementioned concerns is defined as follows. Let C[beta] be a 1 - [beta] confidence set for the nuisance parameter when the null hypothesis is true. Intuition suggests that we might be able to restrict the maximization to the set C[beta]. Indeed we show in section 2 that [Mathematical Expression Omitted] is an alternative valid p value. This p value maybe preferred to [p.sub.sup] on computational grounds (due to maximizing over bounded sets) and on statistical principles (restricting interest to likely regions of [theta]). The value of [beta] and the confidence set [C.sub.[beta]] should of course be specified before looking at the data. Note that [p.sub.[beta]] is never smaller than [beta]. So in practice, [beta] will be chosen rather small, such as .001 or .0001. If [p.sub.[beta]] is to be used to define a level-[alpha] test, then [beta] must be less than [alpha] to obtain a useful test. Because the largest possible value of [p.sub.[beta]] is 1 + [beta], [p.sub.[beta]] could be replaced by min {[p.sub.[beta]], 1}. It is also a valid p value and is always between 0 and 1. We give the theoretical justification for [p.sub.[beta] in the following lemma. The rest of the article is a series of illustrative examples. The first example, a pedagogical example, concerns tests about a normal mean when the variance is unknown. The remaining, more realistic examples concern 2 X 2 contingency tables, the Behrens-Fisher problem, and nonparametric testing for scale differences. 2. VALIDITY OF [p.sub.[beta]] Lemma. Suppose that p([theta]) satisfies (1) for any assumed known value [theta]. Let [C.sub.[beta]] satisfy P([theta] [epsilon] [C.sub.[beta]]) [greater than or equal to] 1 - [beta], if the null hypothesis is true. Let [p.sub.[beta]] be given by (2). Then [p.sub.[beta]] is a valid p value. Proof. Suppose that the null hypothesis is true. Denote the true but unknown [theta] by [[theta].sub.o]. If [beta] > [alpha], then, because [p.sub.[beta]] is never smaller than [beta], P([p.sub.[beta]] [less than or equal to] [alpha]) = 0 [less than or equal to] [alpha]. If [beta] [less than or equal to] [alpha], then P([p.sub.[beta] [less than or equal to] [alpha] = P([p.sub.[beta]] [less than or equal to] [alpha], [[theta].sub.o] [epsilon] [C.sub.[beta]]) + P([p.sub.[beta]] [less than or equal to] [alpha], [[theta].sub.o] [epsilon] [C.sub.[beta]]) [less than or equal to] P(p([[theta].sub.o] [beta] [less than or equal to] [alpha], [[theta].sub.o] [epsilon] [C.sub.[beta]]) + P([[theta.sub.o] [epsilon] [C.sub.[beta]]) [less than or equal to] P(p([[theta].sub.o] [less than or equal to! [alpha] - [beta]) + [beta] [less than or equal to] [alpha - [beta] + [beta] = [alpha]. The first inequality follows because [sup.sub.[theta][epsilon]C[beta]] p([theta]) [greater than or equal to] p([[theta.sub.o] when [[theta.sub.o] [epsilon] [C.sub.[beta]]. 3. EXAMPLES Example 1: Pedagogical Example About a Normal Mean. Let [X.sub.1], . . . , [X.sub.n] be a random sample from a normal population with mean [mu] and variance [[sigma].sup.2]. We consider testing [H.sub.o]: [[mu].sub.o] versus [H.sub.1]: [mu] [not equal to] [[mu].sub.o], where [[mu].sub.o] is a fixed value and [[sigma].sup.2] is the nuisance parameter. We consider this familiar example to illustrate our method, not to offer a serious contender to the usual t test. If [[sigma].sup.2] were known, then we could use the test statistic Z = [square root]n(X - [[mu].sub.o])/ [sigma], where X is the sample mean. Then the two-sided p value would be p([[sigma].sup.2]) = 2[phi](-|[z.sub.obs.]|), where [z.sub.obs] is the value of the test statistic calculated from the data and [phi](z) is the standard normal cumulative distribution function. As a confidence interval for a [sigma] we will use the upper confidence bound given by [Mathematical Expression Omitted] where [S.sup.2] is the sample variance and [Mathematical Expression Omitted! is the 100[beta] percentile of a chi-squared distribution with n - 1 degrees of freedom. The valid p value we propose is [Mathematical Expression Omitted] Because |[z.sub.obs]| is a decreasing function of or, the [sup.sub.c[beta]] occurs at the upper endpoint. (This is why we chose to use an upper confidence bound. Thus [p.sub.[beta]] = 2[phi](-|[z.sub.max])| + [beta], where [z.sub.max] is the test statistic calculated with [Mathematical Expression Omitted!. In this example the test statistic Z depends on the value of the nuisance parameter, a possibility mentioned in Section 1. Also, in this example the p value [p.sub.sup], although valid, is useless because it always has the value 1, because |[z.sub.obs]| [right arrow] 0 as [sigma] [right arrow] [infinity]. So the fact that maximization is restricted to [C.sub.[beta]] when calculating [p.sub.[beta]] is of critical importance in getting a reasonable answer. This example is a bit unusual in that the [sup.sub.c[beta]] can be calculated exactly. In many cases this will need to be calculated numerically. This example is also unusual in that the exact size of the test based on [p.sub.[beta]] can be calculated. Suppose that we reject [H.sub.o] if [p.sub.[beta]] [less than or equal to] [alpha]. Then the actual size of the test is [Mathematical Expression Omitted] where T has a Student's t distribution with n - 1 degrees of freedom and [z.sub.[alpha]] is the 100[alpha] percentile of a standard normal distribution. It can be shown that [Mathematical Expression Omitted! converges to I as n goes to infinity. So the actual size of the test, which is at most a because the p value is valid, converges to [alpha] - [beta]. Example 2: 2 X 2 Contingency Table with Independent Binomial Sampling. Consider a 2 X 2 contingency table consisting of two independent binomial samples: 14 "successes" out of 47 trials for group 1 and 48 "successes" out of 283 trials for group 2. This data appeared in table 1 of Emerson and Moses (1985), who obtained it from Taylor et al. (1982). We consider here the usual 2 x 2 table chi-squared statistic [Mathematical Expression Omitted] where [pi] = ([n.sub.1][[pi].sub.1] + [n.sub.2][[pi].sub.2])/([n.sub.1 + [n.sub.2] and [[pi].sub.1] and [[pi].sub.2] are the sample proportions in the two groups. Figure 1 shows the p value p([pi]) for detecting the difference between the two binomial proportions [[pi].sub.1], and [[pi].sub.2] as a function of the unknown common r under the null hypothesis [H.sub.o]: [[pi].sub.1 = [[pi].sub.2] = [pi]. The p value p([pi]) for a fixed value of [pi] is computed from the binomial distribution as p([pi]) = [sigma] b(x: 47, [pi])b(y: 283, [pi]), where b(x; n, [pi]) is the binomial probability of x successes in n trials with success probability [pi] and the sum is over all pairs (x, y) of x successes from group I and y successes from group 2 that give a [Z.sup.2] Value bigger than or equal to the [Z.sup.2] = 4:346 value calculated from this data. The usual unconditional p value for this problem is [p.sub.sup] = [sup.[pi][sigma][0,1]]p([pi]) = .061. Suissa and Shuster (1985) discussed this p value and recommended it as an appropriate p value for this problem. Looking at Figure 1, however, it would seem natural to restrict the region over which the maximization takes place to a region around the null maximum likelihood estimate [pi] = (48 + 14)/(283 + 47) = .188. A.999 confidence interval for 7r under the null hypothesis is given by [.123, .267] (see, for example, Casella and Berger 1990, p. 499). Numerically calculating the sup of p([pi]) over this interval yields the value .036. Thus the new p value is [p.sub.001] = .036 + .001 = .037. This improvement in the p value is not unusual. We have found similar improvement in numerous 2 x 2 contingency table examples. This example illustrates two important points. First, the supremum of p(pi) may occur at a [pi] value far from the null maximum likelihood estimate [pi]. One might question the relevance of p(.003) = .061, because the data indicate that [pi] is near .188 and not .003. Our method defines a set of ?r values close to [pi] that should be examined in defining a valid p value. Storer and Kim (1990) and others have taken this notion to the extreme and have evaluated only p([pi]). But p([pi]) is typically not a valid p value. Second, the function p([pi]) may be quite variable. Unless it is performed carefully, numeric maximization can fail to find the spikes in Figure 1. The function p([pi]) may be much more stable on the confidence set [C.sub.beta] and maximization on this restricted set is then much easier. In this example p([pi]) is nearly constant between .123 and .267. Example 3: Behrens-Fisher Problem. The classical Behrens-Fisher problem has two independent samples, [X.sub.1], . . ., [X.sub.m] and [Y.sub.1], . . . , [Y.sub.n], from normal distributions with means [[mu].sub.1] and [[mu].sub.2] and variances [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. The null hypothesis is [H.sub.o]: [[mu].sub.1] = [[mu].sub.2], where [Mathematical Expression Omitted] is not assumed equal to [Mathematical Expression Omitted!. Best and Rayner (1987) reaffirmed the practical value of the Welch solution based on [Mathematical Expression Omitted] where X, Y, [Mathematical Expression Omitted], and [Mathematical Expression Omitted! are the usual sample means and sample variances and critical values are obtained from a t distribution with estimated degrees of freedom. But numerous studies have shown that the Welch solution can be slightly liberal. In other words, the corresponding p value does not satisfy (1) for certain combinations of m and n and [Mathematical Expression Omitted] Here we can use our approach along with t, to get a valid p value, because, under [H.sub.o]: [[mu].sub.1] = [[mu].sub.2], the distribution of [t.sub.w] depends only on the ratio of variances [Mathematical Expression Omitted]. Although the distribution of [t.sub.w] is not simple, we can easily simulate from normal distributions to get a p value for each value of [rho]. Figure 2 shows the results for a data set with sample sizes m = 9 and n = 13, sample means 0 and 6.225, and sample variances 18 and 78 (an example taken from Barnard 1984). A .999 confidence interval for p obtained from the F distribution of [Mathematical Expression Omitted] is (.32,38.72). On this interval the maximum two-sided p value is .048, so that [p.sub.001], = .048 + .001 = .049. Because the p value was obtained from 1,000,000 Monte Carlo replications, the standard error of the estimate .049 is around .0002. For comparison purposes, note that the Welch solution p value is .041, the pooled t p value is.065, and the Behrens-Fisher p value is .050. Another way to use our approach in this problem follows from the quantity given by Fisher (1939, p. 176). For a given value of [rho], t([rho]) has a t distribution with m + n - 2 degrees of freedom under [H.sub.o]. Thus we might consider using our approach with t([rho]) and this latter t distribution. The appropriate p([rho]) is easy to calculate and has intuitive appeal. Unfortunately, this p ([rho]) is much more sensitive to changes in p than the simulation p([rho]) based on [t.sub.w]. We do not display the results for p ([rho]) but note that [p.sub.001] = .233 + .001 = .234 and [p.sub..01] = .15 + .01 = .16. Clearly, the method based on t. is superior. In fact, we believe that there is a general principle here concerning our method to the effect that one should use statistics such as t, whose null distribution depends on the nuisance parameter rather than use pivotal quantities such as t([rho]) that are functions of the nuisance parameter but whose null distributions do not depend on the nuisance parameter. Our p value based on t, is a valid p value for the Behrens-Fisher problem. Barnard (1984, sec. 6), Robinson (1976), and Tsui and Weerahandi (1989) claimed that the Behrens-Fisher solution p value is also valid. It would be interesting to compare the power properties of these two competing procedures. The three previous examples were parametric problems where the nuisance parameter [theta] was confined to (0, [infinity]) [0, 1], and (0, [infinity]). Now we turn to a more ambitious semi-parametric problem where [theta] is a location parameter belonging to (-[infinity], [infinity]) but there also is a second infinite dimensional nuisance parameter corresponding to an unknown distribution function. This is really not much more difficult than the previous examples, however, because we can handle this latter nuisance parameter using classical permutation test methods. That is, for each given value of [theta], we will obtain a permutation p value and then carry on as in Examples 2 and 3. Example 4: Testing for Scale Differences in Two Populations with Unknown Locations. Consider two iid samples [X.sub.1], . . . , [X.sub.m] and [Y.sub.1], . . . , [Y.sub.n] with distribution functions F((x - [[mu].sub.1]) / [[sigma].sub.1]) and F((x - [[mu].sub.2]) / [[sigma].sub.2]). The null hypothesis is [H.sub.o]: [[sigma].sub.1] = [[sigma].sub.2]; F, [[mu].sub.1], and [[mu].sub.2] are unknown. This model is not identifiable, but an equivalent description in which all parameters are identifiable is for the X's and Y's to have distribution functions F(x) and F((x - [delta])/[rho]). The null hypothesis is then [H.sub.0]:[rho] = 1; F and [delta] are unknown nuisance parameters. The literature contains numerous good test statistics for this problem but none accompanied by valid finite sample p values. Actually, one can randomly pair the data in each sample and create differences [X.sub.i] - [X.sub.j] and [Y.sub.i] - [Y.sub.j], thereby eliminating the unknown locations. Rank and permutation tests on the differences then yield valid tests, but the loss in power due to the random pairing makes this approach unsuitable. A good review of test statistics and practical methods was provided by Conover et al. (1981). If the difference in locations [delta] was known, then we could subtract [delta] from each of the Y's, pool the X's and the transformed Y's, and carry out the standard permutation approach. That is, we compute a statistic T for each of the [Mathematical Expression Omitted] distinct permutation data sets ([Mathematical Expression Omitted]) drawn without replacement from the set ([X.sub.1], . . . , [X.sub.m], [Y.sub.1] - [delta],. . ., [Y.sub.n] - [delta]). The permutation p value is then the proportion of these values that are greater than or equal to the statistic calculated from the original data. For illustration, we consider the weight gain of a group of m = 30 control rats and of a second group of n = 20 rats whose diet, included calcium EDTA. The data are from Brownie et al. (1986). The observed values for the control group are 34, 22, 54, 33, 20, 32, 35, 24, 13, 22, 26, 38, 34, 30, 20, 30, 25, 32, 36, 22, 26, 28, 31, 28, 32, 31, 28, 28, 31, 31; those for the treated group are 9, 23, 16, 13, -13, 32, 10, 26, 14, -24, 8, 29, 24, 27, 22, 2, 19, 21, 27, -1. Figure 3 shows the estimated p values for [log ([S.sub.1]/[S.sub.2])/ and /log ([g.sub.1]/ [g.sub.2]) 1, where [Mathematical Expression Omitted! and [Mathematical Expression Omitted] are the sample variances and the [g.sub.i] are robust scale estimators with the form [Mathematical Expression Omitted] where the [Z.sub.(i)] are the M = m(m - 1)/2 ordered values of /[X.sub.j] - [X.sub.k]/. These trimmed versions of Gini's mean difference were studied by Janssen, Serfling, and Veraverbeke (1987), and subsequently found to have good efficiency and robustness properties. An exact 1 - [beta] confidence interval for [delta] under [H.sub.o] may be obtained by inverting any two-sample rank test for location differences. Here we use the interval based on the Wilcoxon rank sum statistic with the form [[D.sub.(k)], [D.sub.(l)]), where [D.sub.(l)], . . . , [D.sub.(mn)] are the ordered differences of the form [Y.sub.j] - [X.sub.i] (see Randles and Wolfe 1979, p. 180). The-999 confidence interval for these data is [-24, -3]. This leads to [p.sub..001] = .062 + .001 = .063 for the variance-based statistic of Figure 3a and to [p.sub..001] = .022 + .001 = .023 for the robust statistic of Figure 3b. The standard errors of these p values are about .002 due to using 10,000 random permutations. Asymptotic arguments given by Boos, Janssen, and Veraverbeke (1989) justify the use of p([delta]) in large samples, where [delta] is estimated from the data. For example, Y[bar] - X[bar] = 14.2 - 29.1 = - 14.9, leading to p(- 14.9) = .018 and.006 from Figures 3a and 3b. Taking the ratios .063/.018 and .023/.006 suggests a "cost" factor around 3 to 4 for getting a valid p value for these data in place of an asymptotic approximate p value. We also note that the p value for the nonrobust statistic based on sample variances is much more sensitive to changes in [delta]; it ranges from .0012 to .062 over [delta] [element of] [ -24, -3], whereas the robust statistic based on [g.sub.1] and [g.sub.2] ranges from .005 to .022. 4. SUMMARY Nuisance parameters may be handled in various ways in testing problems. In this article we have introduced a new method for modifying the standard definition of a p value given by p = [sup.sub.[theta] [P.sub.v0, [theta]] (t [greater than or equal to] t) to allow for taking the supremum over a confidence interval for [theta] instead of over the whole parameter space. The new method is not intended to supplant standard methods for handling nuisance parameters when those methods give tractable answers. But our examples suggest that the new method can indeed give improved procedures, as in the case of the 2 X 2 contingency table using the [Z.sup.2] statistic. In other situations the new method can give finite-sample level-[alpha] tests where none previously existed. Finally, we should like to reemphasize the principle mentioned at the end of Example 3: It is preferable to take supremums over the distribution of statistics such as [t.sub.w] rather than take supremums over pivotal quantities like t ([rho]) whose distribution does not depend on the nuisance parameter. In a heuristic sense, the supremum after averaging (or making probability calculations) tends to be smaller than averaging after taking the supremum. [Received August 1992. Revised August 1993.] [Figure 3 [ILLUSTRATION OMITTED] REFERENCES Barnard, G. (1984), "Comparing the Means of Two Independent Samples," Applied Statistics, 3 3, 266-271. Best, D. J., and Rayner, J. C. W. (1987), "Welch's Approximate Solution for the Behrens-Fisher Problem," Technometrics, 29, 205-210. Bickel, P. J., and Doksum, K. A. (1977), Mathematical Statistics. Basic Ideas and Selected Topics, San Francisco: Holden-Day. Boos, D., Janssen, P., and Veraverbeke, N. (1989), "Resampling from Centered Data in the Two-Sample Problem," Journal of Statistical Planning and Inference, 21, 327-345. Brownie, C. F., Brownie, C., Noden, D. S., Krook, L., Haluska, M., and Aronson, A. L. (1986), "Teratogenic Effect of Calcium Edetate (CaEDTA) in Rats and the Protective Effect of Zinc," Toxicology and Applied Pharmacology, 82, 426-443. Casella, G., and Berger, R. L. (1990), Statistical Inference, Pacific Grove, CA: Wadsworth. Conover, W. J., Johnson, M. E., and Johnson, M. M. (1981), "A Comparative Study of Tests for Homogeneity of Variances, With Applications to the Outer Continental Shelf Bidding Data," Technometrics, 23, 351-36 1. Emerson, J. D., and Moses, E. M. (1985), "A Note on the Wilcoxon-Mann-Whitney Test for 2 X k Ordered Tables," Biometrics, 41, 303-309. Fisher, R. A. (1939), "The Comparison of Samples With Possibly Unequal Variances," Annals of Eugenics, 9, 174-180. Janssen, P., Serfling, R., and Veraverbeke, N. (1987), "Asymptotic Normality of U-Statistics Based on Trimmed Samples," Journal of Statistical Planning and Inference, 16, 63-74. Randles, R. H., and Wolfe, D. A. (1979), Introduction to the Theory of Nonparametric Statistics, New York: John Wiley. Robinson, G. K. (1976), "Properties of Student's t and of the Behrens-Fisher Solution to the Two Means Problem," The Annals of Statistics, 4, 963-971. Storer, B. E., and Kim, C. (1990), "Exact Properties of Some Exact Test Statistics for Comparing Two Binomial Proportions," Journal of the American Statistical Association, 85, 146-155. Suissa, S., and Shuster, J. (1985), "Exact Unconditional Sample Sizes for the 2 X 2 Binomial Trial," Journal of the Royal Statistical Society, Ser. A, 148, 317-327. Tsui, K., and Weerahandi, S. (1989), "Generalized p-Values in Significance Testing of Hypotheses in the Presence of Nuisance Parameters," Journal of the American Statistical Association, 84, 602-607. Taylor, D. N., Wachsmuth, I. K., Shangkuan, Y., Schmidt, E. V., Barrett, T. J., Schrader, J. S., Scherach, C. S., McGee, H. B., Feldman, R. A., and Brenner, D. J. (1982), "Salmonellosis Associated With Marijuana: A Multistate Outbreak Traced by Plasmid Fingerprinting," New England Journal of Medicine, 306, 1249-1253. (*) Roger L. Berger is Professor and Dennis D. Boos is Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203. -1- |
To continue reading this publication, you must have a Questia Subscription.Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.