Academic journal article Genetics

Haplotype Inference in General Pedigrees Using the Cluster Variation Method

Academic journal article Genetics

Haplotype Inference in General Pedigrees Using the Cluster Variation Method

Article excerpt

ABSTRACT

We present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities. We focused on single-nucleotide polymorphism (SNP) markers in the evaluation of our approach. In simulated data sets where exact computation was feasible, we found that the accuracy of CVMHAPLO was high and similar to that of maximum-likelihood methods. In simulated data sets where exact computation of the maximum-likelihood haplotype configuration was not feasible, the accuracy of CVMHAPLO was similar to that of state of the art Markov chain Monte Carlo (MCMC) maximum-likelihood approximations when all ordered genotypes were assigned and higher when only a subset of the ordered genotypes was assigned. CVMHAPLO was faster than the MCMC approach and provided more detailed information about the uncertainty in the inferred haplotypes. We conclude that CVMHAPLO is a practical tool for the inference of haplotypes in large complex pedigrees.

(ProQuest: ... denotes formulae omitted.)

THE problem of haplotyping is to infer for each individual the paternally inherited alleles and the maternally inherited alleles from the unordered genotype data. Haplotyping is an important tool for mapping disease-susceptibility genes, especially of complex diseases. It is an essential step in the analyses used for the mapping of quantitative trait loci (QTL) in animal pedigrees. As genotyping methods become increasingly cheaper, efficient and accurate algorithms for inferring haplotypes are desirable.

Since the marker data are generally not informative enough to unambiguously infer the ordered genotypes, a probabilistic modeling approach can be used to deal with the uncertainties. The computer programs MERLIN (Abecasis et al. 2002), GENEHUNTER (Kruglyak et al. 1996), and SUPERLINK (Fishelson and Geiger 2002; Fishelson et al. 2005) reconstruct exact maximumlikelihood haplotype configurations in general pedigrees. Due to the exponential increase of computation time and memory usage with pedigree size (MERLIN, GENEHUNTER) or the tree width of the graphical model associated with the likelihood function (SUPERLINK), application of these programs to large pedigrees and many markers typical of QTL-mapping studies may not be feasible, especially when some of the individuals have missing genotypes or no genotype information at all. Approximate statistical approaches based on Markov chain Monte Carlo (MCMC) sampling (Thompson 1994; Lange and Sobel 1996; Jensen and Kong 1999; Thompson and Heath 1999; Thomas et al. 2000; George and Thompson 2003) use the same likelihood function as the exact probabilistic approaches and consequently may achieve very high accuracy. MCMC methods can be generally applied and have modest memory requirements. Although in theory computation time does not scale exponentially with the problem size, in practice it can be very long and convergence of the Markov chain can be difficult to assess. An efficient statistical approach based on a heuristic approximation of conditional probabilities was proposed by Gao et al. (2004); however, it has been tested only on data sets with no missing genotypes.

To overcome problems of efficiency several nonstatistical approaches have been developed. Wijsman (1987) proposed a zero-recombinant haplotyping method that is linear in the number of markers and individuals. Recently, efficient algorithms were described by Zhang et al. (2005; Baruch et al. 2006). Application of these approaches is limited to data sets without forced recombination events. Qian and Beckmann (2002) presented a six-rule algorithm to reconstruct minimum recombinant haplotypes. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.