Sunday, 22 May 2016

evolution - Within and Between Allelic Class Diversity


I am reading Charlesworth et al. 1997. They talk about diversity within and between allelic classes.



Nucleotide diversities ($π$) at each neutral site were estimated from the mean of $2 \sum z_t (1-z_t)$, over replicated introductions at the site of single variants, where zt is the frequency of the neutral variant at time t, and summation is over all times until either fixation or loss occurs.


The total genetic diversity at the neutral sites ($π_T$) was also decomposed into that within and between allelic classes at the polymorphic locus. Diversity within allelic classes, which will be written here as $π_A$, was estimated from the mean of $2 \sum \left( x_t(1-x_t)+y_t)(1-y_t) \right)$ where $x_t$ and $y_t$ are the frequencies of the neutral variant within the first and second allelic classes, respectively. Diversity between allelic classes with respect to the polymorphic locus was calculated as the difference between the total diversity values and $π_A$




Note that the parentheses don't match up but this this is what is written in the paper!


Why am I confused about this text?


I am confused about the term allelic class. I first think there is anything fancy in here and I think we can simply replace the term "allelic class" by "allele". but then when I saw the equation for $\pi_A$ I realize that the frequency of the two allelic classes does not necessarily adds to 1 (even though we consider only two allelic classes).


I also got also a little confused about the difference between $\pi$ and $\pi_T$ but I think that they just used two notations for the same think ($\pi = \pi_T$)


In population genetics's jargon, diversity just mean expected heterozygosity. $\pi_T$ makes sense to me. It is just the average heterozygosity $\left(2 z(1-z)\right)$ calculated over all time steps. Maybe a more intuitive to put it would to integrate rather than summing over time rather than time steps.


Question


I can read the equation for $\pi_A$ but I fail to get any intuition behind what it means. For example, I have no idea why it should be called within-allelic class diversity. Where does $2(x(1-x)+y)(1-y)$ come from? My whole issue might boil down to the definition of allelic class.


EDIT


The term allelic class is defined in Innan and Tajima (1997)




Suppose that there are two nucleotides, say A and T, in a particular site. Then, we can divide DNA sequences into two classes: one class includes sequences with A and the other includes sequences with T in this site. We call such a class an allelic class



(Slatkin 1996 might help as well).


I am still not quite sure what the within allelic class variance. Maybe it is: Take the most common sequence in the considered allelic class. For each, sequence, calculate the number of pairwise differences to the most common sequence and square this value. Sum over all sequence and divide by the number of sequences. In math form it would be: $\frac{1}{2N}\sum_i^{2N} (D_i)^2$, where $N$ is the population size and $D_i$ is the number of pairwise difference between the sequence $i$ and the most common sequence in the considered allelic class. Does it sound right to you?



Answer



From the way I have read what you have written z(1-z) translated into a sentence would be the frequency of the neutral variant (z) times the frequency of all other possible variants (1 - z) at the particular time t.


Nucleotide diversity is then the average of 2 times the sum of all of the frequencies of neutral variants (z) times the the frequency of all other possible variants (1-z) for all of the time periods until either there is no longer a change in the sequence or the allele is lost (which can happen over evolutionary time, especially if the allelic class is a deleterious variant, or the heterozygous allele provides enough expression to mask).


To me, this sounds like the result will be the probability of the neutral variant existing over time, which should be a number between 0 and 1. If z was 1, that would imply that the neutral variant is always the case, so the frequency of other variants is 0 making 2* 1(1-1) = 0 which makes sense to me as that would mean there is no nucleotide diversity. That sequence is always that sequence and so there is no sequence diversity.


As this looks like it is dealing with frequency distributions I think that total genetic diversity is implying the probability of all of the different allelic classes that make up an allele. So if you have class one which has the frequency of x and a class two with a frequency of y it sounds like the overall diversity would be the probability of the neutral variant of x and the probability of the neutral variant y.


Generally, when you are looking at the probability of multiple events, you would multiply the probability of one event times the probability of the other event. As a result, I am inclined to say that the Nucleotide diversity within classes πA is 2 times the average of the sum of the frequency of x times the frequency of y, or 2∑ x(1-x)(1-y) + y(1-y) or factored 2∑ (x(1-x) + y)(1-y) or in words the Within Class Diversity (πA) is 2 times the average of the frequency of x as the neutral variant times the frequency of all other variants when x is the neutral variant times the frequency of all other variants when y is the neutral variant plus the frequency of y as the neutral variant times the frequency of all other variants when y is the neutral variant.



I think that the reason that this might be done is that, for reasons of selective pressure, x might be favored, so those times that the variant is y, some of those variants (possibly all of them) will be x, so by multiplying the within class diversity of x by the frequency of all of the variants when the class is y implies that there will be less diversity within class than if you just added the frequency probabilities together.


One thing I would do is to do a search to see if there was a correction published to this article, as there was a mistake in the formula. That might help to clarify. Also note, I could be wrong in my assessment as I do not have access to the actual paper you have referenced.


Best of luck in working this out.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...