Academic journal article Genetics

Linkage Disequilibrium Grouping of Single Nucleotide Polymorphisms (SNPs) Reflecting Haplotype Phylogeny for Efficient Selection of Tag SNPs

Academic journal article Genetics

Linkage Disequilibrium Grouping of Single Nucleotide Polymorphisms (SNPs) Reflecting Haplotype Phylogeny for Efficient Selection of Tag SNPs

Article excerpt

ABSTRACT

Single nucleotide polymorphisms (SNPs) have been proposed to be grouped into haplotype blocks harboring a limited number of haplotypes. Within each block, the portion of haplotypes is expected to be tagged by a selected subset of SNPs; however, none of the proposed selection algorithms have been definitive. To address this issue, we developed a tag SNP selection algorithm based on grouping of SNPs by the linkage disequilibrium (LD) coefficient r^sup 2^ and examined five genes in three ethnic populations-the Japanese, African Americans, and Caucasians. Additionally, we investigated ethnic diversity by characterizing 979 SNPs distributed throughout the genome. Our algorithm could spare 60% of SNPs required for genotyping and limit the imprecision in allele-frequency estimation of nontag SNPs to 2% on average. We discovered the presence of a mosaic pattern of LD plots within a conventionally inferred haplotype block. This emerged because multiple groups of SNPs with strong intragroup LD were mingled in their physical positions. The pattern of LD plots showed some similarity, but the details of tag SNPs were not entirely concordant among three populations. Consequently, our algorithm utilizing LD grouping allows selection of a more faithful set of tag SNPs than do previous algorithms utilizing haplotype blocks.

SINGLE nucleotide polymorphisms (SNPs) are stably inherited, highly abundant, and distributed throughout the genome. These variations are associated not only with diversity within and among populations, but also with individual responses to medication and susceptibility to diseases (STRACHAN and READ 2004). In particular, positional cloning of genes for disease susceptibility depends on linkage disequilibrium (LD) and correlations among alleles of neighboring variations, reflecting "haplotypes" descended from a common, ancestral chromosome. It has become clear that chromosomally mapped and ordered SNPs can be grouped into "haplotype blocks" harboring a limited number of distinct haplotypes (GABRIEL et al. 2002). Several studies have shown that the human genome is structured with such segments within which there is strong LD among relatively common SNPs, but between which recombination has left little LD (PATIL et al. 2001). When SNPs are in strong LD, the alleles of a few SNPs on a haplotype suggest the alleles of the other SNPs, which as a result provide redundant information. Consequently, a modest number of common SNPs selected from each segment would suffice to define the relevant haplotypes in presumably any population. This hypothesis has led to the HAPMAP project (http://www.hapmap.org), which aims at developing a map of common haplotype patterns throughout the genome in several ethnic populations. Once each gene (or chromosomal fragment) is subdivided into haplotype blocks, the haplotypes can be "tagged" by a subset of all available SNPs, the socalled tag SNPs. The construction of a haplotype map of the human genome and the definition of tag SNPs are expected to facilitate association studies of common genetic variation, in particular, to determine as-yetunidentified disease-causing alleles.

However, in real data, LD among SNPs does not necessarily produce clear segmentai structure, and selection of tag SNPs is not straightforward. When a well-defined haplotype block contains only a group of SNPs in almost complete LD, any SNP can be used as a tag SNP, and the selection is simple. For two groups of SNPs in no intergroup LD, genotype information of SNPs in one group is not useful to deduce genotype information of SNPs in the other group, and tag SNPs can be selected independently from each group. In most cases, however, because both SNPs in strong LD and those in weak LD mingle in certain chromosomal fragments, selection of tag SNPs has to be made by considering such a complex feature of LD relations. Moreover, as the number of SNPs under investigation increases, LD relations among them become complicated. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.