We need to consider six reading frames when considering the potential of DNA to encode protein (three frames for each strand). But only one strand is transcribed into RNA — the so-called coding strand. It would therefore seem to me that there are actually only three reading frames to consider. Why, then, do people refer to six?
Another point concerning reading frames is the definition of Open Reading Frame — ORF. One text defines ORF as:
“An ORF is a continuous stretch of codons beginning with a start codon (usually AUG) and ending with a stop codon”
whereas another text defines it as
“An ORF is a continuous stretch of codons that do not contain a stop codon (usually UAA, UAG or UGA)”
It seems to me that the first definition is correct. Which is the generally accepted definition for ORF?
Answer
In my opinion this question reflects two things:
- The difficulty students have in appreciating the historical experimental concerns of research workers in an area that is now well understood, and, hence, how it influenced the coining of new technical terms.
- The way that the use of terms has changed with time as old concerns disappear and new ones arise. Thus, a term originally used in one sense may have subsequently been adopted to mean something else, even if this does not appear strictly logical.
What is a coding strand?
This is the crux of the first question, and the answer is that the term ‘coding strand’ does not mean anything (or at least is ambiguous) without context. Thus, I think the poster is assuming a genome context, and this is the fallacy of her argument.
If one talks (or thinks) about the ‘coding strand’ of a DNA genome, one is assuming that because many DNA genomes are double-stranded, if you separated the two strands (e.g. of a small DNA virus) and performed a conceptual translation (decoded the DNA into amino acids using a genetic code with T rather than U) one strand would have all the information for the genes and the other strand would have none. Another way of saying this is that you are assuming that all the genes in the genome have the same directionality (arrow direction on a genome diagram, such as that from E. coli, below). This is hardly ever the case. (The only examples I can think of are the genomes of single-stranded RNA viruses.)
So if you use the term ‘coding strand’, you must state that this is in the context of a single gene. Each strand of DNA in a genome (e.g. E.coli) will contain sections of DNA that are coding and some that are non-coding in terms of conceptual translation. (If you look at papers describing early isolations of genes you will find the words “coding strand” generally qualified by “of the gene”
But cDNA has just one coding strand…
Historically one of the concerns was to sequence eukaryotic cDNAs, DNA copies of mRNAs. These would be monocistronic, i.e. encode a single protein. So here one strand would be coding and the other non-coding. Was it possible to reduce to three the number of reading frames it was necessary to analyse in this case? No! The fact that only one strand is coding was no help at all as there was no way of knowing which this was in the cDNA being sequencing. Likewise for a the fragment of any gene. You had sequenced a piece of DNA, cloned into some plasmid vector and there was no way of telling which strand the sequence you read out originated from. Hence you need to translate it in all six reading frames to find potential amino acid sequences.
So what is an Open Reading Frame?
Open Reading Frame is a term that is often used today in a manner distinct from the way it was used when it was coined. At the time it was coined one would be sequencing short stretches of cDNAs or virus or bacterial genes and there was a low likelihood that one would be sequencing through the C-terminus, i.e. the stop codon of the gene. One’s concern was to concentrate on reading frames that were not interupted by stop codons. This original usage is reflected in the definition of Open Reading Frame in Wikipedia:
“An ORF is a continuous stretch of codons that do not contain a stop codon (usually UAA, UAG or UGA)”
However as knowledge increased and technology improved, the focus switched to discovering genes in the genomic or long partial genomic sequences of organisms. Now one was working with long DNA sequences containing many whole genes. The focus became finding potential genes based on start and stop codons (and a cut-off length). This is reflected in the documentation for the EMBOSS program, getorf:
“An ORF may be defined as a region of a specified minimum size between two STOP codons, or between a START and a STOP codon.”
Note that even this last definition is ambiguous.
Which is correct? That is a concern of students. In the real world one must recognize that the meaning of expressions can change. If there is any ambiguity — as here — one must define the way in which you are using the expression.
No comments:
Post a Comment