上海市医学图像处理与计算机辅助手术重点实验室

上海市医学图像处理与计算机辅助手术重点实验室

    • 复旦大学上海医学院-上海市医学图像处理与计算机辅助手术重点实验室-外观图
    • 复旦大学-博学而笃志,切问而近思
    • 复旦大学上海医学院-上海市医学图像处理与计算机辅助手术重点实验室

    优秀论文

    [Briefings in Bioinformatics] Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction

    发表时间:2024-06-20

    Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction


    Ao Shen*, Mingzhi Yuan*, Yingfan Ma, Jie Du, Manning Wang†


    Briefings in Bioinformatics (IF:9.5)


    Abstract

    Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, are not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and Graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.