Sequencing and Comparative Analysis of a Conserved Syntenic Segment in the Solanaceae
Wang, Ying, Diehl, Adam, Wu, Feinan, Vrebalov, Julia, Giovannoni, James, Siepel, Adam, Tanksley, Steven D., Genetics
Comparative genomics is a powerful tool for gaining insight into genomic function and evolution. However, in plants, sequence data that would enable detailed comparisons of both coding and noncoding regions have been limited in availability. Here we report the generation and analysis of sequences for an unduplicated conserved syntenic segment (CSS) in the genomes of five members of the agriculturally important plant family Solanaceae. This CSS includes a 105-kb region of tomato chromosome 2 and orthologous regions of the potato, eggplant, pepper, and petunia genomes. With a total neutral divergence of 0.73-0.78 substitutions/site, these sequences are similar enough that most noncoding regions can be aligned, yet divergent enough to be informative about evolutionary dynamics and selective pressures. The CSS contains 17 distinct genes with generally conserved order and orientation, but with numerous small-scale differences between species. Our analysis indicates that the last common ancestor of these species lived ~27-36 million years ago, that more than one-third of short genomic segments (5-15 bp) are under selection, and that more than two-thirds of selected bases fall in noncoding regions. In addition, we identify genes under positive selection and analyze hundreds of conserved noncoding elements. This analysis provides a window into 30 million years of plant evolution in the absence of polyploidization.
(ProQuest: ... denotes formulae omitted.)
GENOME sequences are now rarely studied in isolation, but instead are examined alongside their neighbors on the tree of life. Most animal species of primary research importance in genetics-including human, mouse, Drosophila melanogaster, and Caenorhabditis elegans-now belong to whole "sequenced clades," consisting of at least half a dozen and in some cases more than two dozen sequenced species (e.g., Lindblad-Tohet al. 2005; Rhesus Macaque Genome Sequencing and Analysis Consortium 2007; Clark et al. 2007; Miller et al. 2007; Stark et al. 2007) (http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/CaenorhabditisSEQ.pdf). The same is true of the model yeast Saccharomyces cerevisiae (Cliften et al. 2003; Kellis et al. 2003). The species within each of these clades are evolutionarily close enough that noncoding as well as coding sequences can be aligned, yet distant enough that genomic comparisons reveal clear signatures of natural selection. In addition, the generally similar physiology, behavior, and genetics of the organisms within each clade help to facilitate comparative analyses. Comparative genomic analyses of sequenced clades have, among other things, allowed for the identification of new genes, regulatory elements, noncoding RNAs, and conserved sequences of unknown function (e.g., Guigó et al. 2003; Kellis et al. 2003; Bejerano et al. 2004; Siepel et al. 2007; Stark et al. 2007); shed light on duplication and rearrangement histories (Murphy et al. 2005; Jiang et al. 2007); produced refined phylogenies (Thomas et al. 2003; Murphy et al. 2007); and enabled the detection of rapidly evolving coding and noncoding sequences (Clark et al. 2003; Pollard et al. 2006).
In plants, however, comparable sequenced clades have not yet emerged. The main embryophytic (land-plant) species that have been fully sequenced-Arabidopsis thaliana (Arabidopsis Genome Initiative 2000), Oryza sativa (Goff et al. 2002; Yuet al. 2002), Medicago truncatula (Cannon et al. 2006), and Populus trichocarpa (Tuskan et al. 2006)-have been selected primarily for their individual importance as model species or agricultural crops, rather than for their value in comparative genomics. These genomes are sufficiently distant fromone another that they generally do not align outside of coding regions. In addition, each genome has been considerably scrambled with respect to the others by millions of years of rearrangement, duplication, insertion, and deletion, further complicating comparative analyses. …