I want to compute the optimal alignment of two amino acid sequences as per the following definition from a patent:
"The percentage of identity between two peptidic or nucleotidic sequences is a function of the number of amino acids or nucleotide residues that are identical in the two sequences when an alignment of these two sequences has been generated. Identical residues are defined as residues that are the same in the two sequences in a given position of the alignment. The percentage of sequence identity, as used herein, is calculated from the optimal alignment by taking the number of residues identical between two sequences dividing it by the total number of residues in the shortest sequence and multiplying by 100. The optimal alignment is the alignment in which the percentage of identity is the highest possible. Gaps may be introduced into one or both sequences in one or more positions of the alignment to obtain the optimal alignment. These gaps are then taken into account as non-identical residues for the calculation of the percentage of sequence identity."
The Needleman and Wunsch implementation at NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi) mostly works but not exactly. Thanks to @David & @WYSIWIG from a related SE Thread that suggested this (Computing Percent Identity between DNA / Amino Acid Sequence)
I want to know if there is a way to fix the mismatch.
e.g. My test case is:
Seq1: ABDE
Seq2: AAAAAAAAAAABCDE
The NCBI implementation yields the following alignment which has only 3 identical residues:
But shouldn't an optimal alignment with 4 identical residues be possible like so:
Seq1: ----------AB-DE
Seq2: AAAAAAAAAAABCDE
Thoughts? Any way to tweak the implementation to give the result I want? Alternatively any other algorithm that can be coerced into getting this alignment? BLAST or a variant?
No comments:
Post a Comment