# Efficient Estimators with Simple Variance in Unequal Probability Sampling

## Article excerpt

For unequal probability sampling designs, design-based variance estimation is cumbersome because it requires second-order inclusion probabilities. For most fixed sample size probability proportional-to-size ([pi]PS) schemes, these probabilities are difficult to compute, and the variance estimation depends on them for a tedious double-sum calculation. We show how to replace the traditional [pi]PS scenario with simpler design/estimator alternatives that preserve the high efficiency characteristic of XPS schemes. These use the generalized regression estimator, and the variance estimation entails only the calculation of a simple weighted squared residual sum.

KEY WORDS: Design-based variance; Generalized regression estimator; Poisson sampling; Probability proportional to size; Second-order inclusion probabilities.

1. INTRODUCTION

The calculation of precision of survey estimates is important to statistical agencies and survey institutes. From a statistical science viewpoint, this activity is essential, however, it is not carried out on a routine basis in all surveys. One reason is that it is resource intensive. This is particularly true for fixed sample size probability-to-size ([pi]PS) schemes, because of the key role played by the second-order nclusion probabilities.

Consider the finite population U = {1,..., k,...,N}, We wish to estimate the total Y = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [y.sub.k] is the value of the variable of interest, y, for the kth unit. (If M is any set of units, M ?? U, ?? will be our shorthand for ??. Let s be a probability sample drawn from U with a sampling design that assigns the inclusion probability [[pi].sub.k] = P(k [epsilon] s) [is greater then] 0 to unit k. Let ?? denote the sampling weight of k. We assume that the data ?? are observed. The Horvitz-Thompson (HT) estimator [Y.sub.HT] = ?? is design unbiased for Y with variance V([Y.sub.HT]) = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] with [a.sub.kl] = 1/[[pi].sub.kl], where [[pi].sub.kl] denotes the second-order inclusion probability P(k&l [epsilon] s), and [a.sub.kk] = [a.sub.k] = 1/[[pi].sub.]. (If M is any set of units, ?? is our shorthand for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] The inference procedure is completed by calculating a variance estimate and a confidence interval for Y. An unbiased variance estimator is given by the HT formula, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for a fixed sample size design) by the Yates-Grundy alternative, V([Y.sub.HT]) = -(1/2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The classical [pi]PS scenario is as follows: For k = 1,...,N, let [Z.sub.k] be a known positive size measure, believed to be strongly correlated with [y.sub.k]. Construct a fixed sample size (=n) [pi] PS sampling scheme with [[pi].sub.k] = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Use [Y.sub.HT] to estimate Y. Compute the second-order inclusion prohabilities [[pi].sub.kl] (which are generally all unequal), then use them to compute the variance estimate as indicated.

This scenario meets with two off-cited difficulties associated with the [[pi].sub.kl]:

1. When both the [pi]PS requirement [[pi].sub.k] = [n.sub.zk]/Z for all k and the fixed sample size requirement are imposed on the scheme, it becomes tedious and often computationally difficult to calculate the mki, especially if other desirable features such as [a.sub.k][a.sub/l] - [a.sub.kl] [is less than] 0 for all k [is not equal to] 1 are also required.

2. The design-based variance estimation uses the [[pi].sub.kl] in a cumbersome double-sum calculation with n(n - 1)/2 terms (using the Yates-Grundy formula, the sample size being fixed). This very large number of terms effectively rules out correct variance calculation in many [pi]PS surveys. (It is sometimes assumed that the simpler [pi]PS with replacement formula will be satisfactory, but then the correct [pi]PS scenario has already been abandoned. …