Academic journal article Genetics

Perils of Parsimony: Properties of Reduced-Rank Estimates of Genetic Covariance Matrices

Academic journal article Genetics

Perils of Parsimony: Properties of Reduced-Rank Estimates of Genetic Covariance Matrices

Article excerpt

ABSTRACT

Eigenvalues and eigenvectors of covariance matrices are important statistics for multivariate problems in many applications, including quantitative genetics. Estimates of these quantities are subject to different types of bias. This article reviews and extends the existing theory on these biases, considering a balanced one-way classification and restricted maximum-likelihood estimation. Biases are due to the spread of sample roots and arise from ignoring selected principal components when imposing constraints on the parameter space, to ensure positive semidefinite estimates or to estimate covariance matrices of chosen, reduced rank. In addition, it is shown that reduced-rank estimators that consider only the leading eigenvalues and -vectors of the "between-group" covariance matrix may be biased due to selecting the wrong subset of principal components. In a genetic context, with groups representing families, this bias is inverse proportional to the degree of genetic relationship among family members, but is independent of sample size. Theoretical results are supplemented by a simulation study, demonstrating close agreement between predicted and observed bias for large samples. It is emphasized that the rank of the genetic covariance matrix should be chosen sufficiently large to accommodate all important genetic principal components, even though, paradoxically, this may require including a number of components with negligible eigenvalues. A strategy for rank selection in practical analyses is outlined.

(ProQuest: ... denotes formulae omitted.)

TRAITS of interest in quantitative genetics are seldom independent of each other. Hence, in analyses of "complex" phenotypes it is desirable to consider all components simultaneously, in particular when considering the effects of selection and its impact on evolution (Blows and Walsh 2008). However, analyses to estimate genetic parameters are often limited to a few traits only. This can be attributed to the burden imposed by multivariate estimation, due both to computational requirements and limitations and to the need for sufficiently large data sets to support accurate estimation of the numerous parameters involved.

By and large, covariance matrices are considered to be unstructured; i.e., for q traits of interest we have q(q+1)/2 distinct variance and covariance components among them. In a genetic context, there are at least two covariance matrices to be determined, namely the covariance matrix due to additive genetic effects and the corresponding matrix due to residual effects. This yields q(q + 1) parameters to be estimated; i.e., the number of parameters increases quadratically with the number of traits considered. Recently, improvements in computing facilities together with advances in the implementation of modern inference procedures, such as residual or restricted maximum likelihood (REML), have made routine multivariate analyses involving numerous traits and large data sets feasible. In addition, availability of corresponding software, specialized toward quantitative genetic analyses fitting the so-called "animal model," has made analyses conceptually straightforward, even for scenarios with complex pedigrees, many fixed effects, additional random effects, or arbitrary patterns of missing observations.

Yet, the "curse of dimensionality" remains. This has kindled interest in estimation imposing a structure, in particular for genetic covariance matrices; see Meyer (2007a) for a recent review. Principal component (PC) analysis is a widely used method to summarize multivariate information, dating back as far as Hotelling (1933) and Pearson (1901) (both reprinted in Bryant and Atchley 1975). Moreover, PCs are invaluable in reducing the dimension of analyses, i.e., the number of variables to be considered. For a set of q variables, the PCs are the q linear combinations of the variables that are independent of each other and successively explain a maximum amount of variation. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.