Document Type

Article

Publication Date

2025

DOI

10.1038/s41467-025-61745-7

Publication Title

Nature Communications

Volume

16

Issue

1

Pages

6436 (1-22)

Abstract

Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI data. Here, we introduce ColdstartCPI, a framework inspired by induced-fit theory, which leverages unsupervised pre-training features and a Transformer module to learn both compound and protein characteristics. ColdstartCPI treats proteins and compounds as flexible molecules during inference, aligning with biological insights. It outperforms state-of-the-art sequence-based models, particularly for unseen compounds and proteins, and shows strong generalization capability compared to structure-based methods in virtual screening. ColdstartCPI also excels in sparse and low-similarity data conditions, demonstrating its potential in data-limited settings. Our results are validated through literature search, molecular docking, and binding free energy calculations. Overall, ColdstartCPI offers a perspective on sequence-based drug design, presenting a promising tool for drug discovery.

Rights

This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original authors and the source, provide a link to the Creative Commons license, and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Data Availability

Article states: "Data availability is as follows. The data generated and analyzed in the study have been deposited on Zenodo [https://zenodo.org/records/15622018]. The source data of the three benchmarks (the BindingDB, BioSNAP and BindingDB_AIBind datasets) are available on the GitHub pages for DrugBAN [https://github.com/peizhenbai/DrugBAN], BioSNAP [https://github.com/kexinhuang12345/MolTrans/tree/master/dataset/BIOSNAP], and Zenodo [https://zenodo.org/record/7226641], respectively. The publicly available datasets used in this study can be found on their official websites: DrugBank [https://www.drugbank.com/], BindingDB [https://www.bindingdb.org], Drug Target Commons [http://drugtargetcommons.fimm.fi/], DUD-E [https://dude.docking.org/], LIT_PCBA [https://drugdesign.unistra.fr/LIT-PCBA/], Antibiotics [https://doi.org/10.15252/msb.202211081], PDBbind [http://pdbbind.org.cn/], Uniprot [https://www.uniprot.org/], Protein Data Bank [https://www.rcsb.org/], and PubChem [https://pubchem.ncbi.nlm.nih.gov/]. The top 100 candidates predicted by ColdstartCPI are available within Supplementary Information. Unless otherwise stated, all data supporting the results of this study can be found in the article, supplementary, and source data files. Source data are provided with this paper."

Original Publication Citation

Zhao, Q., Zhao, H., Guo, L., Zheng, K., Li, Y., Ling, Q., Tang, J., Li, Y., & Wang, J. (2025). ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance. Nature Communications, 16(1), 1-22, Article 6436. https://doi.org/10.1038/s41467-025-61745-7

ORCID

0000-0003-0178-1876 (Li)

41467_2025_61745_MOESM1_ESM.pdf (6138 kB)
Supplementary Information

41467_2025_61745_MOESM2_ESM.pdf (240 kB)
Reporting Summary

41467_2025_61745_MOESM3_ESM.pdf (3261 kB)
Transparent Peer Review File

Share

COinS