Comparative Genomics and Diversifying Selection of the Clustered Vertebrate Protocadherin Genes

Article excerpt

ABSTRACT

To explain the mechanism for specifying diverse neuronal connections in the brain, Sperry proposed that individual cells carry chemoaffinity tags on their surfaces. The enormous complexity of these connections requires a tremendous diversity of cell-surface proteins. A large number of neural transmembrane protocadherin (Pcdh) proteins is encoded by three closely linked human and mouse gene clusters (α, β, and γ). To gain insight into Pcdh evolution, I performed comprehensive comparative cDNA and genomic DNA analyses for the three clusters in the chimpanzee, rat, and zebrafish genomes. I found that there are species-specific duplications in vertebrate Pcdh genes and that additional diversity is generated through alternative splicing within the zebrafish "variable" and "constant" regions. Moreover, different codons (sites) in the mammalian Pcdh ectodomains (ECs) are under diversifying selection, with some under diversity-enhancing positive Darwinian selection and others, including calcium-binding sites, under strong purifying selection. Interestingly, almost all positively selected codon positions are located on the surface of ECs 2 and 3. These diversified residues likely play an important role in combinatorial interactions of Pcdh proteins, which could provide the staggering diversity required for neuronal connections in the brain. These results also suggest that adaptive selection is an additional evolutionary factor for increasing Pcdh diversity.

AJ important mechanism to generate molecular diversity is through alternative splicing. A special form of alternative splicing uses multiple distinct first exons. Mammalian genomes contain a large number of alternatively spliced genes that have multiple "variable" first exons (ZHANG et al. 2004). The clustered Pcdh genes exemplify this type of alternative splicing that utilizes multiple variable first exons. About 60 similar human and mouse Pcdh genes are organized into three sequentially linked clusters, designated α, β, and oy (see Figure 1, A and C) (Wu et al. 2001). The α and ·γ clusters have a variable and "constant" genomic organization, similar to that of immunoglobulin (Ig) and T-cell receptor ( Tcr) gene clusters (Wu and MANIATIS 1999). Specifically, the variable region of the α cluster contains 15 and 14 highly similar exons in humans and mice, respectively. These variable exons are unusually large (^2.5 kb each) and are organized in a tandem array, which is followed by the constant region of three small exons, in both humans and mice (Figure 1, A and C). Similarly, the variable region of both the human and mouse "y clusters contains a tandem array of 22 large similar exons; while the "y constant region contains three small downstream exons in both species (Figure 1, A and C). In contrast to the α and "y clusters, the human and mouse β clusters contain 16 and 22 variable exons, respectively, but do not contain a constant region. Thus, each member of the human and mouse β clusters is a single-exon gene (Figure 1, A and C).

Each Pcdh variable exon is preceded by a distinct promoter (TASiC et al. 2002), and these promoters share a highly conserved core motif (Wu et al. 2001; NOONAN et al. 2003, 2004). Spécifie promoter activation transcribes a high-molecular-weight precursor RNA that extends through all of the downstream variable and constant exons. However, only the 5'-most variable exon is a's-spliced to the first constant exon to generate functional mRNAs (TASIC et al. 2002; WANG et al. 2002a). Pcdh α and "y proteins are generally located at synaptic junctions in the central nervous system (CNS) (KoHMURA et al. 1998; WANG et al. 2002b; PHILLIPS et al. 2003), where they may form combinatorial hetero-asinteractions (MuRATA et al. 2004) and specific homophilic fraws-interactions (UBATA et al. 1995). Because of their synaptic localization, unusual genomic organization, and characteristic cadherin domains, the Pcdh proteins have been proposed to provide molecular tags for the chemoaffinity hypothesis (SPERRY 1963; SHAPIRO and COLMAN 1999). …