Academic journal article Genetics

Learning Natural Selection from the Site Frequency Spectrum

Academic journal article Genetics

Learning Natural Selection from the Site Frequency Spectrum

Article excerpt

ABSTRACT Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Many tests have been proposed to identify the genomic signatures of natural selection by quantifying the skew in the site frequency spectrum (SFS) under selection relative to neutrality. We build upon recent work that connects many of these tests under a common framework, by describing how selective sweeps affect the scaled SFS. We show that the specific skew depends on many attributes of the sweep, including the selection coefficient and the time under selection. Using supervised learning on extensive simulated data, we characterize the features of the scaled SFS that best separate different types of selective sweeps from neutrality. We develop a test, SFselect, that consistently outperforms many existing tests over a wide range of selective sweeps. We apply SFselect to polymorphism data from a laboratory evolution experiment of Drosophila melanogaster adapted to hypoxia and identify loci that strengthen the role of the Notch pathway in hypoxia tolerance, but were missed by previous approaches. We further apply our test to human data and identify regions that are in agreement with earlier studies, as well as many novel regions.

(ProQuest: ... denotes formulae omitted.)

NATURAL selection works by preferentially favoring carriers of beneficial (fit) alleles. At the genetic level, the increased fitness may stem from two sources: either a de novo mutation that is beneficial in the current environment or new environmental stress leading to increased relative fitness of an existing allele. Over time, haplotypes carrying such variants start to dominate the population, causing reduced genetic diversity. This process, known as a selective sweep, is mitigated by recombination and can therefore be observed mostly in the vicinity of the beneficial allele. Improving our ability to detect the genomic signatures of selection is crucial for shedding light on genes responsible for adaptation to environmental stress, including disease.

Many tests of neutrality have been proposed based on the site frequency spectrum (Tajima 1989; Fay and Wu 2000; Zeng et al. 2006; Chen et al. 2010; Udpa et al. 2011). We start by describing these tests in a common framework delineated by Achaz (2009). The data, namely genetic variants from a population sample, is typically represented as a matrix with m columns corresponding to segregating sites, and n rows corresponding to individual chromosomes. The sample is chosen from a much larger population of N diploid individuals, where chromosomes are connected by a (hidden) genealogy and mutations occurring in a certain lineage are inherited by all of its descendants (Figure 1A). Thus, in the example shown in Figure 1A, the mutation at locus 4 appears in four chromosomes from the sample, or 0.5 frequency. Following Fu (1995), let ji denote the number of polymorphic sites at frequency i/n in a sample of size n. The site frequency spectrum (SFS) vector j and the scaled SFS vector j9 are defined as

... (1)

Thus, in Figure 1A, we have

... (2)

In a constant-sized population evolving neutrally, the branch lengths of various lineages (Kingman 1982), the number of mutations on each lineage (Tajima 1989), and the observed SFS (Fu 1995) are all tightly connected to the population-scaled mutation rate u (= 4Nm) by coalescent theory. Specifically, ... This implies that each ji9ð¼ ijiÞ is an unbiased estimator of u (Fu 1995) and that the scaled SFS j9 is uniform in expectation (as illustrated by the neutral curves in Figure 1).

However, this is not the case for populations evolving under positive selection. We consider the case of a selective sweep, where a single (de novo) mutation confers increased fitness. Individuals carrying the mutation preferentially procreate with probability } 1 + s, where s is the selection coefficient. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.