Friday 30 October 2015

molecular biology - Terminology of the sequences of promoters in relation to DNA strands



I'm studying molecular biology and I'm trying to understand an experiment which shows the importance of promoters in the relative transcription level (RT). The image below comes from Rolf Knippers' book "Molekulare Genetik" (8th edition).


enter image description here


The legend says (among other things):



Die erste Zeile gibt die normale "Wildtyp"-Sequenz der 5'-flankierenden Region wieder.



Which, in english, means something like:



The first line repeats the normal wild-type sequence of the 5' flanking region.




The column on the right gives the relative transcription (RT) level, 1.0 being the highest possible level of transcription. As we can see, the lines where some parts of the "5' region" have been deleted give quite low RT levels, since some regions of the promoter are missing.


My questions are the following:


1) According to this and this, I understand that RNA polymerase reads and uses both coding and non-coding strands in order to synthesize RNA. Therefore, how did they manage to use the sequences strands regions had been deleted to make a polymerase "read them" and perform transcription?


2) If the polymerase reads the template strand in the 3' -> 5' sense, shouldn't we talk of "ATAT box" or "TAAC box" instead of "TATA" or "CAAT" boxes? Does it mean that the "promoter" regions they are using on the graphic above are actually on the coding strand?


Thanks a lot for your help.



Answer



It appears that this question is one of terminology, so I am answering it as such.


Convention for representing features in DNA sequences


The convention is that in indicating any sequence feature† in a protein-coding gene on double-stranded DNA, a single strand‡ is represented — the one from which the amino sequence could be read using the genetic code (conceptually, with T substituted for U). Like any other nucleic acid sequence§, it is always written in the 5ʹ to 3ʹ direction in the same manner as the mRNA transcribed from it, without this being explicitly stated.


† An exception might be hemi-methylation, in which case both strands would be shown.



‡ I call this the sense strand. I discuss the nomenclature further below.


§ An exception is that tRNA anticodons are sometimes written in the 3ʹ to 5ʹ direction for ease of comparison with the codon, but in this case the directionality is indicated.


Origin and justification for this convention




  1. Historical. The amino acid sequence of the protein (the product of the gene) is central to this convention because knowledge of the genetic code, and hence representation of the region of the mRNA that encodes protein — and by extension the DNA — was the first sequence information to be known.




  2. Logical consistency. Later other sequences features were identified (some of which initially may have just been genetic features), e.g. ribosome binding sites, polyadenylation addition signals, transcription start sites, promoters, transcription factor recognition sites. It was logically consistent to represent them on the same strand as the coding sequence.





  3. Functional agnosticism. In many cases the function of a sequence followed its description, so there was no reason initially to place it on any particular strand. However, even if it were thought that the function of some sequence were to be recognized on the opposite strand (what I would call the anti-sense strand), it would be unwise scientifically to change the representation to indicate this. Science progresses and interpretation changes. Better to separate concrete descriptive features from conclusions about their function.




It couldn’t ever be an ATAT box


Even if you represented the TATA box on the anti-sense strand, it could never be called an ‘ATAT box’ (as suggested by the poster) because, according to the basic convention, ATAT is 5ʹ-ATAT-3ʹ, and on the antisense strand the sequence is 3ʹ-ATAT-5ʹ, i.e. TATA!


Terminology for referring to the two strands of dsDNA


Whereas the above is the convention followed universally, the following is just my opinion. In science terminology is important to communicates ideas unambiguously, so I think it worthwhile explaining the ambiguity in some of the terms, the use of which I discourage.


Sense and anti-sense This is my preferred term because, although not perfect, it avoids the pitfalls of the others. The idea seems to me clear that when you read the string of codons that encode the amino acid sequence from this strand they ‘make sense’. (Anti-sense is used in preference to non-sense, as ‘nonsense’ was the term used historically for mutations that converted amino acid codons into stop codons.) It can also be extended to non-protein coding genes (e.g. for tRNA), where ‘sense’ correlates with the sequence of the gene product.


I shall use ‘sense’ and ‘anti-sense’ as reference terminology in discussing other terms.



Coding and non-coding This has the disadvantage that it cannot be extended to non-protein coding genes. However my primary objection is that it can cause confusion, as coding only unambiguously refers to mRNA. As the anti-sense strand is the template for RNA polymerase, one might make the mental association between this and ‘coding’, whereas it is the sense strand that is meant in this (admittedly common) usage.


Template and non-template Template might be a more logical term for the anti-sense strand as this is the template for transcription by the RNA polymerase (although mRNA is also a template — for translation). However it is used only infrequently.


Plus and minus This terminology is used for single-stranded (especially RNA) viruses to represent the whole genome, where in plus-strand viruses the genome is also the mRNA, i.e. the sense strand. One problem here is that it can be confusing for the beginner, who by extrapolation may assume that in double-stranded DNA genomes one strand is the sense strand for all genes. It is not. Which brings me to my final point…


…whatever terminology you use, it is better to make sure it is clear that you are referring to the sense or anti-sense strand of a gene, not of the whole genome. You need to employ some other terminology to distinguish between the two strands of e.g. bacterial or plasmid DNA if it is necessary to distinguish them.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...