Recep Adiyaman
Daily Signal January 29, 2026 · 10 min read

Issue #35: PepScorer::RMSD: An Improved Machine Learning Scoring Function for Protein-Peptide Docking.

Protein Design Digest - 2026-01-29 - PepScorer::RMSD: An Improved Machine Learning Scoring Function for Protein-Peptide Docking.

Share X LinkedIn
Protein Design Daily

Building something in Protein Design?

I love collaborating on new challenges. Let's build together.

Subscribe to Protein Design Digest

Daily curated signals from arXiv, PubMed, and BioRxiv.

Signal of the Day

PepScorer::RMSD: An Improved Machine Learning Scoring Function for Protein-Peptide Docking.

Over the past two decades, pharmaceutical peptides have emerged as a powerful alternative to traditional small molecules, offering high potency, specificity, and low toxicity. However, most computational drug discovery tools remain optimized for small molecules and need to be entirely adapted to peptide-based compounds. Molecular docking algorithms, commonly employed to rank drug candidates in early-stage drug discovery, often fail to accurately predict peptide binding poses due to their high conformational flexibility and scoring functions not being tailored to peptides. To address these limitations, we present PepScorer::RMSD, a novel machine learning-based scoring function specifically designed for pose selection and enhancement of docking power (DP) in virtual screening campaigns targeting peptide libraries. The model predicts the root-mean-squared deviation (RMSD) of a peptide pose relative to its native conformation using a curated dataset of protein-peptide complexes (3-10 amino acids). PepScorer::RMSD outperformed conventional, ML-based, and peptide-specific scoring functions, achieving a Pearson correlation of 0.70, a mean absolute error of 1.77 Å, and top-1 DP values of 92% on the evaluation set and 81% on an external test set. Our PLANTS-based workflow was benchmarked against AlphaFold-Multimer predictions, confirming its robustness for virtual screening. PepScorer::RMSD and the curated dataset are freely available in Zenodo.

Why this matters: Expands the searchable sequence space for novel folds and high-affinity binders.


Also Worth Reading

Decrypting potential mechanisms linking ochratoxin A to hepatocellular carcinoma: an integrated approach combining toxicology, machine learning, molecular docking, and molecular dynamics simulation.

Background Ochratoxin A (OTA), a common food-borne mycotoxin, is a potential human carcinogen, yet the specific molecular mechanisms linking it to hepatocellular carcinoma (HCC) remain unclear. Methods We integrated network toxicology to predict OTA targets and intersected them with HCC transcriptomic data to identify key candidate genes. Functional enrichment analysis was then conducted. Multiple machine learning algorithms were applied to screen and validate core genes. Furthermore, molecular docking and molecular dynamics (MD) simulations were employed to evaluate the binding stability between OTA and key target proteins. Results A total of 50 key genes were identified as potential targets for potential OTA-associated hepatocarcinogenesis. Enrichment analysis revealed their significant involvement in critical processes such as xenobiotic metabolism and oxidative stress response. Machine learning analysis prioritized eight core genes (AURKA, GABARAPL1, CA2, PARP1, LMNA, SLC27A5, EPHX2, and GSTP1), and a combined diagnostic model demonstrated outstanding performance (AUC = 0.986). Structural analyses via molecular docking and MD simulations confirmed stable binding interactions between OTA and these core targets. Conclusions This integrated computational study identifies a set of candidate genes through which OTA may potentially interact with HCC-associated molecular networks. The robust binding predicted between OTA and the core targets provides a structural basis for these interactions. These findings offer a prioritized list of targets and a theoretical framework for subsequent experimental validation and investigation into OTA’s toxicological role in HCC.

Artificial Intelligence Driven Virtual Screening and Molecular Docking Approaches Identified LIFR, BTG2, EPHX2, and PAK3 as Targets and BI-2536, AP-24534, and AZ-628 as Repurposed Drugs for PDAC.

Pancreatic ductal adenocarcinoma (PDAC) is one of the most aggressive and lethal tumors worldwide, with limited effective treatments. Globally, the incidence of pancreatic cancer is expected to rise to 18.6 per 100,000 by 2050, with an average annual growth rate of 1.1%, implying that PDAC would represent a considerable public health burden. Identifying prognostic markers is critical for making therapy decisions and improving patient outcomes. In this study, the microarray gene expression data of PDAC were analyzed using artificial intelligence (AI) algorithms and molecular docking to identify the differentially expressed genes (DEGs) and drug repurposing. The GSE183795 dataset used in this study was obtained from the National Centre for Biotechnology Information. Further, the data were analyzed using GEO2R tools, and genes were selected based on logFC values>2. Then, these genes were ranked using AI algorithms such as support vector machine (SVM), logistic regression, random forest, extreme gradient boosting (XGB), and one-dimensional convolutional neural network to identify the DEGs. The performance of the models was evaluated using stratified 10-fold cross-validation and different classification metrics. A drug library was prepared using DepMap corresponding to the identified DEGs, and subsequently, molecular docking and pharmacokinetics analysis were performed. The result of the logFC>2 listed 107 upregulated genes in PDAC. It was observed that SVM and XGB show the average 10-fold accuracy, sensitivity, specificity, precision, and F-score of 79.25%, 78.37%, 78.37%, 79.33% and 78.35% respectively. Our results revealed that LIFR, BTG2, EPHX2, and PAK3 are within the top three and commonly ranked by AI models. Further, we identified three drugs, such as BI-2536, Ponatinib (AP-24534), and AZ-628, which show the best efficacy based on the binding energies by molecular docking analysis. The pharmacokinetics study strengthened our results that the identified drugs can be used as a therapeutic for PDAC as they obey Lipinski’s rule. In conclusion, identified genes can act as prognostic markers, and drugs could be used as potential therapeutics for PDAC.

Study on the Mechanism of Ku Diding in the Treatment of Diabetes based on Network Pharmacology, Molecular Docking Technology, and Molecular Dynamics.

Introduction To explore how Ku Diding (KDD) works in managing Diabetes Mellitus (DM), researchers utilized network pharmacology, molecular docking, and molecular dynamics methodologies. Methods Key active components of KDD were identified using the Traditional Chinese Medicine Systematic Pharmacology Database and Analysis Platform (TCMSP). Data for diabetesrelated targets were retrieved from the Human Genetic Comprehensive Databases (Genecards) and the Online Mendelian Inheritance in Man (OMIM) database. The intersection of these targets was analyzed to determine potential therapeutic targets for diabetes treatment. Proteinprotein interaction networks (PPI) were constructed using the STRING database and Cytoscape software, followed by Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Molecular docking between the components and key targets was performed using the AutoDock Vina platform. Results This study identified that Dihydrosanguinarine, (S)-Scoulerine, among others, are the main active ingredients of KDD for treating DM, showing high affinity for critical targets like PTGS2 and PRKACA, through multiple pathways including vascular regulation, neuromodulation, metabolic regulation, and endocrine regulation. The molecular docking results showed that there are interactions between the active ingredients and the key targets, with the majority of the effective components exhibiting a stronger binding affinity than Metformin. Among them, (S)-Scoulerine and Dihydrosanguinarine demonstrated high docking affinity with the key target proteins PTGS2 and PRKACA. Discussion DM is closely linked to oxidative stress, chronic inflammation, and insulin signaling dysregulation. This study reveals that KDD exerts anti-diabetic effects via a multi-target network involving proteins such as PRKACA, PTGS2, ESR1, FOS, and DRD2. These targets are associated with glucose metabolism, inflammation, oxidative stress, and neural regulation. Modulation of these pathways likely enhances insulin sensitivity, lowers blood glucose, suppresses inflammation, and protects against oxidative damage. GO and KEGG analyses further indicate involvement in MAPK signaling, synaptic transmission, and vascular regulation, forming a multidimensional “metabolism-inflammation-neural” regulatory network. Compared to Metformin, most KDD-derived compounds showed stronger binding, highlighting their therapeutic potential. Molecular dynamics simulations support the stability of the observed binding conformations, suggesting their potential as therapeutic targets. These findings underscore KDD’s ability to simultaneously target multiple pathological mechanisms, offering a holistic treatment strategy for DM. Conclusion This study provides preliminary evidence that KDD is characterized by a multicomponent, multi-target, and multi-pathway approach in the treatment of diabetes mellitus (DM), thereby establishing a scientific foundation for further in-depth exploration of KDD’s molecular mechanisms.


Research & AI Updates

From the Industry


Quick Reads

Investigating the impact of aspartame on Alzheimer’s disease through network toxicology and molecular docking.

Introduction Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder, and the relationship between its pathogenesis and environmental factors has garnered increasing scholarly interest. Read more →

Integrative molecular simulations reveal NeuroAid II mechanisms in ischemic stroke through network pharmacology, molecular dynamics, and pharmacophore modeling.

Ischemic stroke remains a major health challenge with limited treatment options. Read more →

Molecular Investigation of Product Nkabinde in HIV Therapy: A Network Pharmacology and Molecular Docking Approach.

HIV/AIDS continues to pose a significant global public health concern, with Sub-Saharan Africa having the highest number of people living with HIV (PLHIV). Read more →

Discontinued BACE1 Inhibitors in Phase II/III Clinical Trials and AM-6494 (Preclinical) Towards Alzheimer’s Disease Therapy: Repurposing Through Network Pharmacology and Molecular Docking Approach.

Background : β-site amyloid precursor protein cleaving enzyme 1 (BACE1) inhibitors demonstrated amyloid-lowering efficacy but failed in phase II/III clinical trials due to adverse effects and limited disease-modifying outcomes. Read more →

Study on the Molecular Mechanism of Interaction Between Perfluoroalkyl Acids and PPAR by Molecular Docking.

Per- and polyfluoroalkyl substances (PFASs), as a class of “permanent chemicals” with high environmental persistence and bioaccumulation, have attracted much attention. Read more →

Baricitinib in chronic kidney disease: an exploratory analysis integrating network toxicology, molecular docking and pharmacovigilance.

Background Chronic kidney disease (CKD) presents a major global health challenge due to ineffective therapies against progressive renal fibrosis. Read more →

Anticancer Activity of Picolinamide and Sulfur Chelated Pt(II) Complexes Against Breast Cancer: In Vitro Interaction Studies Through Molecular Docking With Bio-Receptors.

Herein, picolinamide (pica) and sulfur chelated Pt(II) complexes were focused to investigate for their bioactivity and cytotoxic property. Read more →

Exploring Antibiotic Degradation Mechanisms: Molecular Docking Analysis of Beta-Lactamase Enzymes from Pseudomonas songnenensis.

This study investigates the potential of Pseudomonas songnenensis (P. Read more →

Pipeline Tip

Verify FASTA headers for special characters that break Rosetta pipelines.


Resources & Tools

Deep learning is not a magic wand, but a powerful lens for structural biology. — Recep Adiyaman

BS HF DK