This is a basic question but I couldn't find an answer through a web search; hopefully this is the right place to ask. Is the number of base pairs in a particular chromosome the same in all individuals? For example if I take an X-chromosome from two random humans would I count exactly 155,270,560 base pairs in both cases? or are there mutations that would make one longer than the other? If they're not exactly the same, what's the range in length variation?
Answer
Welcome to Biology.SE.
if I take an X-chromosome from two random humans would I count exactly 155,270,560 base pairs in both cases
No, you would probably not find the exact same number of base pairs because mutations do no only change one nucleotide to another (what we call a substitution) but sometimes add or delete few (or sometimes many) nucleotides.
note, btw that you don't need to take two different individuals, you can just consider the two X chromosomes of a female (or any other pair of chromsom in any gender) and find this difference in the length of chromosomes.
what's the range in length variation?
Good question (+1)!
Telomere issue
Before I start, I want to make clear that I consider the length of those chromosomes at the moment of conception. Chromosomes will vary in length during the lifetime due to telomere reduction. I will not consider this in the following calculations. Also, some mutations directly introduce (or delete) a large number of nucleotides (transposable elements for example), I am not considering those mutations here, assuming they are rare in comparison to to single insertions and single deletions (this assumption might not hold!). So please really take the following with a grain of salt.
Let's make some messy calculations
In classical theoretical population genetics, we tend to consider mostly substitutions. But I can maybe try to make some extrapolation out of this work if you allow me to make some strong assumptions, use poor estimates of actual true values and using some non-rigorous mathematics! This is going to be ugly and not extremely trustful $\ddot \smile $.
Not explaining why this is true (it is a result coming from Coalescent Theory), the expected number of pairwise differences between two neutral sequences for a diploid population is $E[\pi] = 4\cdot N\cdot \mu$ (quite an impressively simple result), where $N$ is the population size (assuming panmictic population) and $\mu$ is the mutation rate for the whole sequence. Assuming a constant per site mutation rate of $\mu_s = 10^{-9}$. Knowing the length of the sequence of interest (chromosom X) $L ≈ 1.55 \cdot 10^8 $, the mutation rate for the whole sequence is $\mu = L\cdot \mu_s = 0.155$. Let's consider a population size of $N = 5 \cdot 10^7$ (the equations assume a panmictic population so I just took some value that felt more or less reasonable to me much smaller than the actual worldwide population size). Therefore the total number of substitutions should be $E[\pi] = 4 \cdot 5 \cdot 10^7 \cdot 0.155 ≈ 10^7$.
Now, let's assume that only a fraction of $\frac{1}{100}$ of the mutations bring variation in the number of nucleotides, we might want consider the value $\frac{1}{100} \cdot 10^7 = 10^3$. And because a mutation that deletes a nucleotide from a long sequence will rather diminish the number of variant in sequence length that increasing it, let's say that will divide this number by 10!... so I'd say that two typical X chromosome would differ in length by about 100 nucleotides.
I am sure that with some work one can come up with more rigorous calculations and a more accurate expectation. Intuitively, the result of 100 nucleotides doesn't sound totally crazy (it is not 1 nor 10^6 at least).
Also, one could probably use available sequence data to estimate this value.
No comments:
Post a Comment