Wednesday, 16 October 2019

biochemistry - Why is ab initio protein secondary structure prediction less reliable than alternatives?


To predict secondary structure of proteins three types of Algorithms are used Ab initio, homology based and neural networks. Among these neural networks prove to be more accurate and give good results as compared to ab initio, especially when multiple sequence alignment and neural networks are combined the accuracy of results reach almost to 75%. My query is that what actually lacks in ab initio methods which reduced their sensitivity and specificity and lacks the ability to predict reliable secondary structures? (Some of ab initio based softwares that I have studied are Chau Fasman and GOR Method)




Answer




In the comments @CMosychuk linked the most comprehensive text on ab initio I have seen this side of the year 2000 (Lee et al., 2009). It agrees that you've raised a good point.



It is important to acknowledge that ab initio prediction methods solely based on the physicochemical principles of interaction are currently far behind, in terms of their modelling speed and accuracy, compared with the methods utilizing bioinfor- matics and knowledge-based information.



This is a good real life example of different philosophical approaches of inductive reasoning and deductive reasoning (AKA top-down logic) behaving as one might expect. Each specific software has it's own problems and advantages and may not fit into the below categories. I'll grossly simplify each approach in crude categories and then broadly explain why errors crop up in them.


Below, we'll discuss ab initio and why it goes wrong and then the alternatives including why they're currently generally more reliable. We'll round up with how ab initio is probably going to become realistically involved in reliable protein predictions.



Diagram showing the flow charts of comparative modelling versus *Ab initio* modelling.



Figure showing the broad pipeline difference between ab initio and homology modelling.Nara Institute of Science and Technology.


Ab initio often works on the premise that physical laws can be applied universally in biology. As you are most probably aware, biochemistry is far messier than chemistry and often those physical laws appear bent if not completely broken. The goal of ab initio is therefore to combine physical properties with some fudge factors to account for a complex system. Due to the complexity of the system never being fully modelled, ab initio predictions are limited to what your input is. There are also limitations pointed out in the 2009 article:



  • potential energy functions.

  • conformational search engines

  • model selection scheme


What you can see now is that ab initio is missing something that the others have: biological experimental empiricism. Without that inductive reasoning ab initio modelling is very vulnerable to assuming far too much and oversimplifying the system (This xkcd springs to mind).


Although perfect ab initio prediction is the "Holy Grail" of protein structure prediction, currently if there is homology which provides an experimentally known structure then that is often more reliable.


Ab initio is not completely without merit. For relatively short proteins (100 residues) the accuracy has improved over the last decade (Lee et al., 2009). From the above mentioned article:




...the physics-based atomic potentials have proven to be useful in refining the detailed packing of the side chain atoms and the peptide backbones.




This type of modelling is a safe bet provided that the MSA is available and reliable. The premise here is that if two proteins look similar enough at a sequence level, and you know the structure of one, then probably there is structural conservation as well. The problem is that obviously, those small changes in sequence, or other contextual differences like post translational modifications, localised pH differences etc. might mean that the protein folds differently.



Neural networks are a form of machine learning. Conceptually, these use the same assumptions as homology modelling, but on a larger more sophisticated scale (not necessarily an advantage). A good example of “machine learning” winning out in terms of homology clustering and secondary structure prediction is that of HHpred. An algorithm will make subsets of "if" statements. If a protein has a certain combination of "ifs" and "nots" then it's safe to assume other proteins with the same "ifs" and "nots" have similar structures.


You need to know a lot of information about the dataset that you put in, and also one needs to know which of that information is relevant. If these conditions aren't thoroughly complied with the categorisation results will just be amplified errors, arbitrary categorisations, missing categories, or a combination of those three.


It's worth pointing out at this point that most modern secondary structure predictors use a combination of homology and machine learning.




More recently iterative optimisation based on vague experimental data has become the standard practice shifting ab initio from a top down approach to something more inductive. Kulic et al., 2012 is an interesting example of such a modern holistic approach. It involves an iterative optimisation of ab initio protein structures based on experimentally derived structures.


Another excerpt from the 2009 article.



Thus, developing composite methods using both knowledge-based and physics-based energy terms may represent a promising approach to the problem of ab initio modelling.



No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...