SS-Pro: a simplified Siamese contrastive learning approach for protein surface representation
Ao Shen, Mingzhi Yuan, Yingfan Ma, Manning Wang†
Frontiers of Computer Science (IF=4.2)
Abstract
Protein surface serves as an important representation of protein structure, determining the biological functions. With the flourishing of deep learning, various descriptors for protein surface have been proposed, delivering promising results in tasks like protein design and interaction prediction. However, these data-driven methods face the challenges of label scarcity, since labeled data are typically obtained through wet lab experiments. Motivated by the success of self-supervised learning in natural language processing and computer vision, we incorporate self-supervised learning to mitigate the label scarcity. This paper introduces SS-Pro, a simple and efficient contrastive self-supervised learning framework, which can be adapted to various protein surface networks. We leverage a large dataset of unlabeled protein surface data for pre-training and fine-tune the downstream network with the pre-trained weights. To validate our approach's effectiveness, experiments are conducted on two downstream tasks: protein surface binding site recognition and protein-protein interaction prediction. The results demonstrate performance enhancements across four different protein surface networks, highlighting the strong generalization and efficacy of our approach across many applications.