Sunday, 5 November 2017

evolution - How does GC-content evolve?



Background


GC-content refers to the frequency of base pairs that are either C or G in the genome, or in other words the number of GC base pairs divided by the addition of the number of GC base pairs plus the number of AT base pairs.


$$GCcontent = \frac{N_{GC}}{N_{AT}+N_{GC}}$$


Question


How does the GC-content evolve and why does the GC-content differ between populations/species/lineages? Does it evolve under Genetic drift only? Under selection? Intuitively, I'd say that the ratio of the probabilities of mutating from A or T to G or C should be an important factor driving the evolution of the GC-content. Does it? Does the overall mutation rate influence the GC-content? What other traits/forces influence the evolution of the GC-content?



Answer



I think the key work here is 'evolve'. Overall GC/AT ratios change by mutations, whose rate is constant. The probability that given a mutation event that one base will be substituted by another one has been modeled in several ways where the probabilities of different mutations may or may not be the same.


Overall the GC content will tend to close to 50%. What causes GC rich genomes to become GC rich (60-70%) are that mutations to GC base pairs have selective advantages either in regions or in the genome overall that cause them to be retained. The mutation rate may be no different (or even lower) in GC rich organism (many of them are deep underground or deep underwater. GC rich genomes occur because AT->GC mutations convey an advantage and they stick around.


The reasons that the GC content migrates away from 50% fall into two categories I will call entropic and selective.


By entropic I mean specifically that coding sequences for genes and other features such as binding sites on the DNA or other features such as centromeres, which will cause the overall ratio to vary from 1 because the sequence is constrained by the information it contains. While coding regions have a ratio higher than 1, GC content tends to hover around 54%. Eukaryotes have GC islands and etc, but this also does not overall change the GC content



So gene rich genomes and typical functional features of the genome do not really explain some of the spectacularly high GC contents found; up to the 70% range. While the link above looks at GC bias in coding regions, its a given that any part of the genome that is merely a spacer between elements with specific functions will freely vary to GC if it is useful.


Selective factors for high GC content include high pressure and temperature environments for instance, which usually strongly bias to high GC content by this mechanism. You can imagine how this works: high GC content genomes are thermodynamically more stable and can survive the extra molecular collisions of higher energy of those environments more readily.


GC rich genomes are not simple adaptations to live with. All the genes for DNA oriented processes such as transcription, chromosome packing, DNA polymerase have to adjust a lot. As the organism adapts to hotter temperatures or higher pressures, each individual protein produced will have to also change to be stable and function in the new conditions. As such, these changes happen only over long evolutionary times. This is probably a good part of the reason why the archaea niches have not been superseded by eubacteria in all 1+ billion years since life has been on Earth.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...