Academic journal article Genetics

A Covariance Structure Model for the Admixture of Binary Genetic Variation

Academic journal article Genetics

A Covariance Structure Model for the Admixture of Binary Genetic Variation

Article excerpt

ABSTRACT

I derive a covariance structure model for pairwise linkage disequilibrium (LD) between binary markers in a recently admixed population and use a generalized least-squares method to fit the model to two different data sets. Both linked and unlinked marker pairs are incorporated in the model. Under the model, a pairwise LD matrix is decomposed into two component matrices, one containing LD attributable to admixture, and another containing, in an aggregate form, LD specific to the populations forming the mixture. I use population genetics theory to show that the latter matrix has block-diagonal structure. For the data sets considered here, I show that the number of source populations can be determined by statistical inference on the canonical correlations of the sample LD matrix.

(ProQuest: ... denotes formulae omitted.)

ADMIXTURE, the mixing of genetically differentiated populations via migration and subsequent intermating, can create linkage disequilibrium (LD) between genes, even when the genes are not physically linked (see, e.g., Cavalli-Sforza and Bodmer 1971, p. 69; Prout 1973). In this work, I show that admixture contributions to LD can be statistically quantified and distinguished from LD attributable to the shared ancestry of linked alleles. Ohta (1982) used Wright's island model to decompose a squared coefficient of LD into within- and between-population terms, in analogy withWright's (1940) decomposition of the inbreeding coefficient. These decompositions assume that populations connected by migration can be identified and sampled for genetic variation. The method I propose uses a pairwise LD matrix sampled from an admixed population of unknown composition; the number of source populations and the components of LD are inferred by use of a multivariate statistical model.

The data: blocks of binary markers: It is convenient to develop the model using gametes as the basic units of observation. The data are then n random binary vectors of the form x = (x1, . . . xL)9, with xl ∈ {0, 1}; l = 1, . . . , L. Each vector represents the single nucleotide polymorphism (SNP) variation on one sampled gamete, under an arbitrary binary coding scheme, with xl indicating the allele on gamete x at the lth marker locus. The markers are assumed to be selectively neutral and variable in the sample.

Before statistical analysis begins, the markers in x are to be grouped into blocks by the investigator, based on physical criteria (e.g., the markers in a block share a localized region on a physical map), along with empirical evidence (e.g., the markers in a block are known by a previous linkage-mapping study to form a linkage group) independent of the sample under consideration. Each marker belongs to exactly one block; however, a particular marker may be the only member of a block. Any two markers l, m within the same block are assumed to be linked, with recombination fraction clm [much less than] ½. In contrast, markers l, j from different blocks are assumed to be unlinked, with recombination fraction clj [asymptotically =] ½.

In the development below, block structure derives from physical and linkage relationships between markers, with blocks analogous to linkage groups; this is to be distinguished from empirical descriptions of "haplotype block structure" (see, e.g., Gabriel et al. 2002; Phillips et al. 2003). Recent work of the International HapMap Consortium (2005) and Myers et al. (2005) suggests that haplotype block structure results from variation in recombination rates over small physical distances. The fine-scale rate estimates obtained by Myers et al. (2005) demonstrate new tools for constructing blocks of tightly linked markers. In the data examples to follow, I form two blocks of markers on the arms of the human X chromosome, using simple physical criteria.

To fix notation, I assume that x can be partitioned into B blocks, labeled 1, . . . , B in any convenient order (though the same partitioning and labeling scheme is used for all gametes). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.