Wednesday, 14 December 2016

genetics - Infer gene frequency within a species over time


I was reading Karlsson et al. (2014) and I came into this:



A selected variant that increases rapidly in frequency in the past ~250,000 years can be detected as an unusual reduction in genetic diversity.



I realised that I do not know how to infer a specific allele frequency over time within a given species.


I tried to googled some keyword but was flooded by other concepts. Could you please direct me to some appropriate documentation/kewords?



Answer



There's two parts to your post that I want to address, the first is the quote (because I want to make sure you understand it well), and the second is about general inference methods for estimating the genetic composition of ancestral populations.


The Quote: Selective Sweeps




A selected variant that increases rapidly in frequency in the past ~250,000 years can be detected as an unusual reduction in genetic diversity.



When an allele is selected for it will spread relatively quickly in the population, relative to the spread of neutral alleles for example. If an allele becomes fixed in the population the diversity in that gene is zero; there is no standing genetic variation in the gene. Selection will often reduce genetic variation but see this post too.


However, selection doesn't just reduce genetic diversity at the selected locus, but at loci near to the gene, those that are linked. The loss of genetic diversity at linked loci occurs by a process called a selective sweep. This is defined (somewhat poorly) in the web version of your linked paper:



Selective sweeps; Reductions in genetic variation caused by positive selection at particular loci.



Basically, a selective sweep occurs when strong selection causes one allele, and the loci it is highly linked to, to spread through a population. Genetic diversity will be lost from the linked loci at a rate determined by the strength of selection and the degree of linkage (where tighter linkage and stronger selection increase the rate of loss). This paper (see section 7 for selective sweeps) provides a good discussion of the factors affecting genetic variation in natural populations, and draws on the example of the Y chromosome:




One or more selective sweeps will have left the Y chromosome with little or no variability



Inference of ancestral populations


You could sample DNA of the population at the time you are interested in. However, doing so is very tricky. DNA degrades over time therefore it is important to have an understanding of how DNA degrades over time if you want to infer about the population. (I saw a talk a while back by a researcher, can't remember the name, taking samples from graves. I don't know how correctly I remember this, but much of the DNA they collect is bacteria. Just a tiny fraction of the DNA they got from sampling human bones found in graves that were just a few hundred years old was actually human. Another issue of studying old DNA). It's often difficult to find sources of DNA samples which a) of high enough quality and b) with enough individuals to allow good inference of the ancestral population; a small sample will be prone to sampling bias.*


There is another approach, and its commonly used now. If you are interested in finding more you should be searching for coalescent theory and methods. This infers back, based on current (or relatively more recent) genetic composition of populations and using population genetic theory. It's not really an used as an attempt to estimate the specific allele frequency, rather as an attempt to infer population size, migration rates, and recombination rates. It infers when was the most recent common ancestor (MRCA). This paper reviews coalescent methods for phylogenetic trees and this is an introduction lecture on coalescence. Coalescent theory has many considerations that need to be made; Are some mutations more common than others (see Molecular Clock)? How does selection affect genetic variation (e.g. selective sweeps)? Does linkage vary across the genome? Do rates of mutation, drift, and selection vary across the genome? Are different parts of the genome differently affected by migration?


Both approaches raise serious considerations which are at the forefront of evolutionary genetics right now. Research groups all over the world are developing lab methods, statistical methods, and mathematical models in an attempt to make inference more accurate. Right now the only way, in my opinion, to infer the frequency of specific alleles in ancestral populations is to use ancient DNA methods; coalescent can be used to infer population genetic parameters, but regarding specific alleles it just has so many factors to consider which (right now, but certainly less so in the future) we just don't have a thorough understanding of. In other words, the assumptions that need to be made for coalescence to be able to estimate specific ancestral allele frequencies are rarely going to be satisfied. However, as long as this is properly discussed there is no problem with the method being used, and I am certain that the future is bright for such methods.




*On sampling bias: Imagine you want to work out the frequency of the number two on a six-sided die (we know the true frequency is 0.167). You roll the die four times and the die shows the side with two dots once. You population frequency estimate is 0.25. Your friend rolls the same die 4000 times. They see the two dots 652 times, which is 0.163, much more representative of the true population frequency. The moral of the story, small samples can give misleading estimates of the truth.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...