A Genomics Approach to the Detection of Positive Selection in Cattle: Adaptive Evolution of the T-Cell and Natural Killer Cell-Surface Protein CD2
Lynn, David J., Freeman, Abigail R., Murray, Caitriona, Bradley, Daniel G., Genetics
The detection of adaptive evolution at the molecular level is of interest not only as an insight into the process of evolution but also because of its functional implications for genes of interest. Here, we present the first genomics approach to detecting positive selection operating on the Bos taurus lineage, an important domestic species. This analysis led to the identification of the T-cell and natural killer (NK) cell receptor cluster of differentiation 2 (CD2) as having a strong signal of selection. Further detailed investigation of CD2 revealed that this gene was subject to positive selection during the evolution of a number of mammalian lineages. Moreover, we show that selection has operated primarily on the extracellular domain of CD2 and discuss the implications of this for an important regulator of the adaptive immune response.
THE detection of positive selection in genes of interest is important not only for understanding the process of evolution, but also because these signatures of selection can provide us with real insight into the functional significance of these molecules. Indeed, evidence for adaptive evolution has been provided for a number of important molecules involved in a range of processes, including reproduction (SwANSON et al. 2001), development (FARES et al. 2003), taste (Sm et al. 2003), and particularly, the immune system (HUGHES and YEAGER 1998; FILIP and MUNDY 2004; LYNN et al. 2004). Although these and other studies have been informative in understanding the evolution of individual genes, the abundance of sequence information for many species now provides us with the opportunity not only to investigate the effects of natural selection on a gene-by-gene basis but also to use a systematic genomics approach. Whereas some recent studies have tended to focus on humanprimate divergence (CLARK et al. 2003; GIMELBRANT et al. 2004), here we fit models of evolution by maximum likelihood to >3000 orthologous genes from four mammalian species to search for evidence of positive selection in the bovine lineage.
The analysis presented here is the first reported systematic approach to the study of positive selection at the molecular level in this important domestic species. In this article we use a bovine gene data set generated from clustered expressed sequence tags (ESTs), since a gene data set from the genome project is currently unavailable. Although this data set is not without errors, our approach has nevertheless proven to be a powerful method in the detection of positive selection in this species. In particular, we have identified a strong signature of selection in cluster of differentiation 2 (CD2), a T-cell and natural killer (NK) cell-surface protein of considerable importance in the mammalian immune response (DAViS et al. 2003). More detailed analysis of this molecule has confirmed the signature of selection identified using the genomics approach and has revealed that positive selection has operated primarily in the extracellular domain of this molecule, raising important questions regarding its function.
MATERIALS AND METHODS
Generating a bovine and porcine gene data set: As the complete cow and pig genome sequences have yet to be published, alternative sources of sequence information must be utilized to generate bovine and porcine gene data sets. Fortunately, large collections of ESTs are now available for these species. A total of 335,668 bovine and 272,260 porcine ESTs and mRNAs were downloaded from GenBank (httpi/'www.ncbi.nlm.nih.gov). Both of the data sets were masked for repetitive sequences using RepeatMasker (http:^repeatmasker.org) and cleaned of vector and poly(A) contaminants using SeqClean (http:// www.tigr.org/tdb/tgi/software/). Following this process 332,076 bovine and 270,260 porcine sequences were available for clustering. The ESTs were clustered and assembled into consensus sequences using the TIGR gene indices clustering tools (TGICL) (PERTEA et al. …