Gradient descent optimization and deep reinforcement learning for protein-protein interaction
Abstract
Reconstruction of the 3D structure of protein dimers is a crucial and challenging task. Although inter-protein contacts have been found useful in the modeling process of protein complexes, a few methods have been introduced to tackle the challenging quaternary structure prediction problem utilizing inter-chain contacts. We propose an optimization method based on gradient descent algorithm, called GD, to reconstruct the quaternary structures of protein complexes from inter-protein contacts. We test the performance of the GD method on both homodimers and heterodimers utilizing both true and predicted inter-protein contacts. GD has a superior performance than a Markov Chain Monte Carlo (MC), and a method based on Crystallography and NMR System (CNS). When native inter-chain contacts are provided as inputs, GD builds high quality models with TM-scores of more than 0.92 and interface RMSDs (I_RMSDs) of less than 1.64 A for both homodimers and heterodimers. Receiving the predicted inter-chain contacts as restraints, GD is able to generate models with a mean TM-score of 0.76 for 115 homodimers. Besides, for nearly half of the homodimers, GD reconstructs high quality models with TM-scores more than 0.9 using just the predicted inter-chain contacts to guide the modeling process. We also develop a self-learning algorithm based on reinforcement learning, named DRLComplex, to reconstruct protein dimers from true/predicted inter-protein contacts. We evaluate DRLComplex on two standard datasets including CASP-CAPRI dataste (28 homodimers), and Std32 (32 heterodimers). If native inter-chain contacts are provided, DRLComplex generates models with mean TM-score of 0.9895 and mean I_RMSD of 0.2197 for CASP-CAPRI dataset, and models having average TM-score of 0.9881, and average I_RMSD of 0.92 for Std32. Using predicted inter-chain contacts as restraints, DRLComplex builds models with overall average TM-scores of 0.73 and 0.76 for CASP-CvAPRI and Std32, successively. Moreover, utilizing predicted contacts, DRLComplex improves the mean I_RMSD of the reconstructed models for the Std32 dataset by 0.29 percent, 1.01 percent, 13.47 percent, and 8.69 percent over GD, MC, CNS, and Equidock (an end-to-end quaternary structure prediction method), respectively. In addition, the mean I_RMSD of the models predicted by DRLComplex for CASP-CAPRI dataset utilizing predicted contacts is 0.04, 3.94, and 4.07 lower than MC, CNS, and Equidock.
Degree
M.S.