Thursday, 28 July 2016

dna sequencing - What direction is a sequence in databases written?


In many databases, the DNA sequences for proteins are given as a string of a,t,g,c without specifying whether the starting is from 5' or from 3'. Also it is not specified if it is the coding or non coding strand.


Is it because all the sequences are written from 5' to 3' of coding strand only?



Answer



Directionality


It is indeed the convention to represent nucleic acid sequences in the 5ʹ to 3ʹ direction.



This is implied in the IUPAC/IUB document on Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents, although not stated explicitly — presumably because this was written in 1974, before the large nucleic acid databases were established.


Strand


In general you can assume nothing about which strand a particular feature is located on. You need to refer to the context or documentation for the particular database that you are using.


I prefer the term ‘sense strand’ to ‘coding strand’ as explained in another post. However this only has meaning in a restricted set of circumstances relating to mRNA, particularly considering cDNA copies of eukaryotic mRNAs. Only if the context indicates that this is the case can you assume the strand presented as a ‘sense strand’.


The problem arises from the fact that for all (or almost all) genomes, different genes are located on different strands of the DNA — the chromosome has no unique ‘sense strand’ or ‘coding strand’. Thus, for DNA sequences in a database such as Genbank, the following are possible:



  • The DNA sequence presented does not encode protein or structural RNA.

  • The DNA sequence presented contains genes on both strands.


An example of the latter is given in the Sample GenBank Record which should be consulted to understand the feature annotation in DNA sequence entries in GenBank. This 5028 bp yeast chromosome entry encodes two genes. The first, AXL2, is annotated:




 gene            687..3158
/gene="AXL2"

The second, REV7, is annotated:



 gene            complement(3300..4037)
/gene="REV7"

This indicates that, when presented in the 5ʹ to 3ʹ direction, the gene REV7 lies on the complement of the strand presented.



No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...