Ranking Protein-Protein Binding Using Evolutionary Information and Machine Learning
Discriminating native-like complexes from false-positives with high accuracy is one of the biggest challenges in protein-protein docking. The relationship between various favorable intermolecular interactions (e.g. Van derWaals electrostatic desolvation forces etc.) and the similarity of a conformation to its native structure is commonly agreed though the precise nature of this relationship is not known very well. Existing protein-protein docking methods typically formulate this relationship as a weighted sum of selected terms and tune their weights by introducing a training set with which they evaluate and rank candidate complexes. Despite improvements in recent docking methods they are still producing a large number of false positives which often leads to incorrect prediction of complex binding. Using machine learning we implemented an approach that not only ranks candidate complexes relative to each other but also predicts how similar each candidate is to the native conformation. We built a Support Vector Regressor (SVR) using physico-chemical features and evolutionary conservation. We trained and tested the model on extensive datasets of complexes generated by three state-of-the-art docking methods. The set of docked complexes was generated from 79 different protein-protein complexes in both the rigid and medium categories of the Protein-Protein Docking Benchmark v.5. We were able to generally outperform the built-in scoring functions of the docking programs we used to generate the complexes attesting to the potential of our approach in predicting the correct binding of protein-protein complexes.