Friday, 28 June 2019

dna - Why are the genomes of Humans 99.5% the same?


Human's DNA sequence is said to be roughly 99.5% equal. As far as I understand, this means that if I walked up to you and compared our DNA, the sequence of base pairs would be 99.5% the same.



My question is: Why is there only 0.5% variation in our DNA? I can think of two reasons:




  1. For some reason, the other 99.5% (roughly) is locked in place. These sections, if changed, either result in us not being 'human', or simply get some sort of genetic disease and so aren't very good mutations.




  2. There hasn't been enough time. I remember hearing that all humans have a common ancestor some long time ago. If that's the case, perhaps over time only about 0.5% of our DNA has had time to randomly mutate, and if we wait a really really really long time, more and more of our DNA will mutate, and we will become 'less like each other'.




Or am I completely misunderstanding the situation? I didn't take biology in high school so answers should be simple please.




Answer



In a genome that is 3 billion base pairs, a difference of 0.5% works out to a difference of 15 million bases. When a single base change can change the amino acid sequence of a protein, that can add up to a huge amount of diversity, which is what we see over the nearly 8 billion humans on the planet, and the 99.5% sameness is why we are linked together so closely as a species.


How can the lack of diversity be explained?



Nothing in Biology Makes Sense Except in the Light of Evolution.



The above quote was made by the evolutionary biologist Theodosius Dobzhansky. What this basically means is that you need to take a look at your question from the perspective of evolution and natural selection.


The first thing you need to note is that there is evidence to suggest that modern humans went through a population bottleneck. What that means is that at a certain point in our history, there were very few individual members of our species alive and this can account for the "lack" of diversity that we see in our genome. The number that is quoted is as low as 10,000 individuals left alive after the supereruption of the Toba volcano about 75 thousand years ago.


That means that there were very few mating pairs left, and that significantly reduced the genetic diversity of our species. This is known as Founder Effect.




Founder effects


A founder effect occurs when a new colony is started by a few members of the original population. This small population size means that the colony may have:



  • reduced genetic variation from the original population.

  • a non-random sample of the genes in the original population.



So if we use an average of 20 to 25 years per human generation, we are only looking at about 3,000 to 4,000 generations since there were maybe only a few thousand breeding pairs of humans. Contrast that with bacteria which can go through three to four thousand generations in less than a week, and you can begin to get a feel for why there is so much sequence homology (sameness) in the human genome. See Bottlenecks and founder effects at UC Berkley's Understanding Evolution site for more information.


You also should understand that any changes that are deleterious will be under strong selective pressure. This may not mean that the organism may not survive to reproduce, in may just mean that on average, they may be less successful; fewer offspring that have fewer offspring in comparison to the rest of the population. At some point that lineage may go extinct, either because the change makes it very difficult to survive, or the offspring make less desirable mates and eventually don't get the opportunity to reproduce.


Any changes that disrupt a vital gene will likely be fatal and will lead to spontaneous abortion or if the mutation was sustained in a sperm, then the inability of that sperm to survive to fertilize an egg, so the sperm that was heavily mutated may just never make it to the egg. The average number of sperm in a single human ejaculation is about 250 million. And yet only a single sperm makes it to fertilize the egg, and that is assuming that a fertilization event takes place. There is an incredible amount of selection that goes on in the act of procreation, even before we get to development.



Why the 99.5% to 0.5% figure might be misrepresentative of the population as a whole.


Another thing that you will want to think about is how the 0.5% to 99.5% number was arrived at. The referenced in the question links to a Wikipedia article Human Genetic Variation. The reference provided for this was from a New York Times article about J. Craig Venter, In the Genome Race, the Sequel Is Personal - Wade, Nicholas. September 4, 2007. The quote that is references in Wikipedia is:



Biologists had estimated that two individuals would be identical in 99.9 percent of their DNA, but the true figure now emerges as much less, around 99.5 percent, Dr. Scherer said.



The 0.5% figure is provided as a quote from Stephen W. Scherer, who was a coauthor of this paper, The Diploid Genome Sequence of an Individual Human, Levy, et.al. PLoS Biology; September 4, 2007. DOI: 10.1371/journal.pbio.0050254. From that paper the following quote is found:



Inclusion of insertion and deletion genetic variation into our estimates of interchromosomal difference reveals that only 99.5% similarity exists between the two chromosomal copies of an individual and that genetic variation between two individuals is as much as five times higher than previously estimated.
- The Diploid Genome Sequence of an Individual Human, Levy, et.al. PLoS Biology; September 4, 2007.
DOI: 10.1371/journal.pbio.0050254




So at the very least, the statement was made in a peer reviewed article. However there are some problems with this.


The first is the method of the paper. It looks at a single human genome, that of Dr. Venter, and is used to draw the conclusion of 0.5% difference between any two individuals. And the comparison is made based on the analysis of differences between Dr. Venter's maternal chromosomes and paternal chromosomes. Let's look at what the problems might be here:



  • It is a huge leap of statistical faith to assume that a single genome is representative of the nearly 8 billion genomes out there.

  • Dr. Venter's genealogy is English on both parental lineages. We have to remember that there was another population bottleneck in middle ages Europe caused by outbreaks of cholera and the Black Death. This fact may imply that Dr. Venter's lineages are genetically close and that if his lineage were more diverse, say Indigenous Oceanian and English, they may have found a much higher percent difference between maternal and paternal chromosomes.



The donor's three-generation pedigree is shown in Figure 1A. The donor has three siblings and one biological son, his father died at age 59 of sudden cardiac arrest. There are documented cases of family members with chronic disease including hypertension and ovarian and skin cancer. According to the genealogical record, the donor's ancestors can be traced back to 1821 (paternal) and the 1700s (maternal) in England. Genotyping and cluster analysis of 750 unique SNP loci discovered through this project support that the donor is indeed 99.5% similar to individuals of European descent (Figure 1B), consistent with self-reporting.
- The Diploid Genome Sequence of an Individual Human, Levy, et.al. PLoS Biology; September 4, 2007.

DOI: 10.1371/journal.pbio.0050254




  • Dr. Venter is male. The Y-chromosome was problematic to sequence, and you cannot really do a comparison between it and the X-chromosome. How would the genome of a woman compare?



The Y chromosome is 59% covered by the one-to-one mapping due to difficulties when producing comparison between repeat rich chromosomes. In addition, the Y chromosome is more poorly covered because of the difficulties in assembling complex regions with sequencing depth of coverage only half that of the autosomal portion of the genome. The X chromosome coverage with HuRef scaffolds is at 95.2%, which is typical of the coverage level of autosomes (mean 98.3% using runs). However it is clear that the X chromosome has more gaps, as evidenced by the coverage with matches (89.4%) compared with the mean coverage of autosomes using matches (97.1%). The overall effects of lower sequence coverage on chromosomes X and Y are clearly evident as a sharp increase in number of gaps per unit length and shorter scaffolds compared to the autosomes (Figure 3). Similarity between the sex chromosomes is another source of assembly and mapping difficulties. For example, there is a 1.5-Mb scaffold that maps equally well to identical regions of the X and Y chromosomes and therefore cannot be uniquely mapped to either (see Materials and Methods and Figure 3). From our one-to-one mapping data, we are also able to detect the enrichment of large segmental duplications 10 on Chromosomes 9, 16, and 22, resulting in reduced coverage based on difficulties in assembly and mapping (Table S3).
- The Diploid Genome Sequence of an Individual Human, Levy, et.al. PLoS Biology; September 4, 2007.
DOI: 10.1371/journal.pbio.0050254




There is another thing that you need to remember and that is the history of the Human Genome Project. Dr. Venter developed a new method of sequencing and became a commercial competitor to the International Human Genome Sequencing Consortium project. While the International Human Genome Sequencing Consortium project beat out Dr. Venter's enterprise(8), it's approach, to use an amalgamation of the DNA of many donors, some of whom had their DNA included and some who did not, to preserve anonymity, Dr. Venter circumvented this by performing the sequencing on his own DNA. When his team was beat out by the International Human Genome Sequencing Consortium, he doubled-down, invested more time and effort and came up with the sequence put forth in this paper, which had a higher degree of resolution than previous efforts. The paper is making an argument for the value of their technique over the techniques being used by the International Human Genome Sequencing Consortium and is trying to make a claim of the superiority of their method. The data being presented is likely factual, but because of this history you have to look at the paper, the methods, and the motives behind it.


It is only 15.5 years since we had a full draft of a Human Genome Project produced by the International Human Genome Sequencing Consortium. In order to sequence that genome samples were taken from a few hundred individuals. Small stretches of the DNA were sequenced and then all of those pieces were put together in computer. That means that there was a lot of averaging that happened. So by default, the most prevalent base in the population of volunteers at every position was chosen. That means that that should be representative for the rest of the population as well.


When we make statements about comparisons in datasets as large as the human genome, more often than not we are looking at known, small subsets of the genome, and we are extrapolating by statistical methods based on that fact. One thing you learn as you read science papers, and not just science news articles, is that you always need to look at their materials and methods to determine how they arrived at their data and headline numbers. It is often quoted that we have only a 1.5% difference between Humans and Chimpanzees, however when you look at those papers the comparisons were done only on areas of the genomes that lined up well. That isn't to say that there isn't a remarkable degree of similarity between us and our nearest living evolutionary relative, it is just that there is more to the story than the headline.


Also, we don't have that many full genomes sequenced to compare to one another, as a percentage of the 8 billion people alive today. While we have representations from most a vast majority of the ethnic groups identified, we may be missing families with novel lineages.


Another thing that you need to think about is that many of the areas in the human genome are repeats and structural elements that do not encode for genes of affect gene expression. Things like telomeres, which protect the ends of chromosomes, and centromeres which are the attachment points on chromosomes that make sure that the are distributed properly into newly divided cells. These elements tend to stay the same because the proteins that bind to them require that sequence specificity in order to function properly, so those are regions were almost every ones DNA will be the same, and not just between humans. These elements often are very similar in most vertebrates.


We also need to consider structural RNAs. These molecules have functions within the cell that require that there sequences be highly conserved (unchanged). Ribosomal RNAs, Transfer RNAs, Small Nuclear RNAs all rely on their specific sequence to fold properly in order to perform their task in the cell. A single base change can change the folding of these elements and as a result they will loose their function, which means those cells won't be able to produce proteins properly which means that the cells will die. If this is happening in germline cells, then they are not going to be involved in fertilization so any mutations there will be passed on.


I could go on and on, but the take away is that it isn't surprising that there is very little sequence diversity on a percentage bases within the human species, but there are many caveats to that fact as well.


Edit: Significant edits to add relevant, referenced information to back up answer.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...