DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

2020-01-24T21:37:43Z (GMT) by Yifei Qi John Z.H. Zhang

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at http://protein.org.cn/densecpd.html. The results of this study may benefit the further development of computational protein design methods.