Issue #69: Molecular embedding-based algorithm selection in protein-ligand docking.
Protein Design Digest - 2026-03-17 - Molecular embedding-based algorithm selection in protein-ligand docking.

Building something in Protein Design?
I love collaborating on new challenges. Let's build together.
Subscribe to Protein Design Digest
Daily curated signals from arXiv, PubMed, and BioRxiv.
Signal of the Day
Molecular embedding-based algorithm selection in protein-ligand docking.
Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled complexes, MolAS achieves up to a 15 percentage-point absolute improvement over the single best solver (SBS) and closes 17-66% of the virtual best solver (VBS)-SBS gap across five docking benchmarks. Analyses of selection frequencies, margin-conditioned reliability, and benchmark-level oracle structure indicate that MolAS is most effective when the workflow-defined oracle landscape has low winner entropy and a reasonably separable top-solver region, but degrades under protocol mismatch that shifts solver rankings and changes the induced labels. These results suggest that, in the evaluated regime, robustness is limited less by representational capacity than by workflow- and protocol-induced instability in solver hierarchies, positioning MolAS as an in-domain selector for fixed pipelines and as a diagnostic tool for assessing when docking algorithm selection is well-posed. Scientific Contribution: MolAS introduces a controlled, embedding-based selector that reduces dependence on heavy graph encoders, enabling a cleaner separation between representational choices and workflow-defined label structure. A cross-benchmark and cross-protocol analysis links selection success and failure to oracle entropy, near-ties among top solvers, and protocol-induced ranking shifts, providing an evidence-backed diagnostic account of when docking algorithm selection is likely to yield gains. The findings differentiate this work from prior docking AS studies that report in-domain improvements under a single fixed workflow by explicitly characterising protocol dependence and motivating protocol-aware modelling as a route to stronger generalisation.
Why this matters: Essential ground-truth data for validating next-gen foundation models like Boltz or Chai.
Also Worth Reading
Mechanisms of Okanin against wound healing based on network pharmacology, molecular docking and molecular dynamics simulation.
Wound healing is a critical aspect of modern medicine, impacting patient health, quality of life, and healthcare resource allocation. Okanin, a flavonoid from the Asteraceae family, has shown potential in promoting wound healing. This study investigates okanin’s key molecular targets, binding affinity, and mechanisms of action using network pharmacology, molecular docking, molecular dynamics simulations, and in vivo experimental validation. Okanin’s potential targets were identified using the Comparative Toxicogenomics Database (CTD) and SwissTargetPrediction, while wound healing-related targets were sourced from GeneCards and DrugBank. Overlap analysis of these datasets revealed common targets. Key target proteins were filtered through protein-protein interaction (PPI) analysis using the STRING database. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using Metascape to build a drug-target-pathway-disease network. Molecular docking was performed with AutoDockTools, and binding affinity was evaluated through energy scores, particularly with AURKA and HDAC1. Molecular dynamics simulations with GROMACS confirmed the stability of okanin-target complexes. ADME/T properties were assessed using SwissADME and ProTox-3.0 to evaluate pharmacokinetics and toxicity. In vivo quantitative real-time PCR (qRT-PCR) was performed to assess the expression of selected target genes in a mouse wound model following topical okanin treatment. A total of 72 common targets were identified between okanin and wound healing. PPI network analysis highlighted 17 key targets, with molecular docking revealing the highest binding affinity for AURKA and HDAC1 (ΔG = - 8.8 kcal/mol for both). GROMACS were then run on the top complexes. Target-ligand stability was quantified by convergence of RMSD/Rg, sustained hydrogen-bond counts, and MM/GBSA binding free energies (AURKA, - 24.27 ± 3.65 kcal/mol; HDAC1, - 47.7 ± 1.60 kcal/mol), confirming robust interactions. SwissADME predicted good drug-likeness (MW = 288.25 g/mol; logP = 1.69; high GI and moderate skin permeability) and no P-gp liability, while ProTox-3.0 indicated low systemic toxicity (LD₅₀ = 2500 mg/kg). qRT-PCR results demonstrated that okanin treatment significantly downregulated AURKA and PIK3R1, while upregulating HDAC1, in wounded skin, supporting the predicted molecular interactions and regulatory functions. Okanin promotes wound healing through multiple molecular targets and pathways, including antioxidant, anti-inflammatory, and cell proliferation mechanisms. Its high binding affinity for AURKA and HDAC1, along with modulation of the IL-17 and AMPK signaling pathways, underscores its therapeutic potential. This study provides a comprehensive theoretical and experimental framework for the development of okanin as a topical agent for wound healing, with future research focusing on formulation development and translational applications.
IRIS: A Machine Learning-Based Pose Reranking Tool for RNA-Ligand Docking.
Given their fundamental roles in cellular processes and disease pathogenesis, RNA molecules are promising therapeutic targets. Predicting the 3D structure of RNA-ligand complexes using computational docking is a key element of rational, structure-based inhibitor design. However, RNA-ligand docking remains challenging, due in part to intrinsic properties of RNA such as structural flexibility and a highly charged phosphate backbone. rDock, a widely used RNA docking program, can generate ligand poses close to the experimental structure, but its scoring function frequently fails to rank these poses above less accurate alternatives. To supplement rDock, here we introduce the Intelligent RNA Interaction Scorer (IRIS), a regression model leveraging physicochemical and interaction-based features and trained on the largest data set of experimental nucleic acid-ligand complexes compiled to date for any ML-based tool designed for RNA docking (608 structures). IRIS improves rDock RNA-ligand pose ranking relative to the use of rDock scores alone. Using the best-performing rDock protocol on the RNA portion of the data set, we find that at least one of the 100 top generated poses for any given complex is within 2.0 Å RMSD of the native pose in 86.3% of test complexes. Of these 86.3%, the default rDock scoring function ranks the correct pose first in 42.7% of cases. IRIS improves this latter fraction to 59.8% and increases the success rate for selecting a near-native pose among the top five ranked poses from 64.6% to 78.0%. IRIS thus significantly enhances pose ranking accuracy and can be seamlessly integrated into docking pipelines to rerank ligand poses in RNA-targeted drug discovery.
Multi-omics investigation of benzo[a]pyrene in gastric cancer: comprehensive network toxicology, machine learning and molecular docking approaches.
Gastric cancer (GC) risk is shaped by environmental exposures such as benzo[a]pyrene (BaP). Here, we systematically identified BaP-toxicological targets and dissected their contribution to GC development. BaP-related targets were independently predicted with stringent filters from ChEMBL, Similarity Ensemble Approach (SEA) and PharmMapper databases, while GC-related targets were mined from the Comparative Toxicogenomics Database (CTD), GeneCards and OMIM databases. Overlapping targets were subjected to protein-protein interaction (PPI) network construction, functional enrichment analysis and molecular docking. We then integrated multi-omics data using ten clustering algorithms to identify the consensus GC subtypes, which were subsequently employed 101 machine learning combinations to develop a consensus benzo[a]pyrene-related signature (CBRS) for GC patients. As a result, we identified seven hub toxicological targets: ALB, HSP90AA1, ESR1, INS, TP53, TNF, and EGFR, underscoring their potential central roles in BaP-driven GC pathogenesis. These targets are enriched in the MAPK, Lipid and atherosclerosis, and PI3K-Akt signaling pathway. The BaP-toxicological classifiers and the CBRS prognostic model could provide useful support for risk stratification and inform personalized therapeutic strategies for GC patients. Molecular docking results suggest that BaP exhibits relatively strong binding affinity with these key toxicological targets, potentially implicating their involvement in BaP-induced gastric cancer toxicity. Therefore, this study integrates multi-dimensional omics data with advanced machine learning algorithms to establish a comprehensive analytical framework for the toxicological effects of between BaP and GC, which transcends the limitations of traditional analyses and offers unprecedented insights and evidence chains for elucidating the pathogenesis of GC.
Research & AI Updates
- AlphaFold Database Expands with Protein Complex Insights - Mirage News — AlphaFold Database Expands with Protein Complex Insights Mirage News.
- AI tools helped him do what doctors couldn’t: Australian techie uses ChatGPT & Google’s Alphafold to develo… - Bhaskar English — AI tools helped him do what doctors couldn’t: Australian techie uses ChatGPT & Google’s Alphafold to develo.
- Man uses ChatGPT and AlphaFold to build DIY mRNA cancer vaccine, saves dog - MSN — Man uses ChatGPT and AlphaFold to build DIY mRNA cancer vaccine, saves dog MSN.
- ChatGPT and AlphaFold help techie develop DIY mRNA cancer vaccine, saving his dog - The Financial Express — ChatGPT and AlphaFold help techie develop DIY mRNA cancer vaccine, saving his dog The Financial Express.
- Australian techie uses ChatGPT, AlphaFold to create personalised cancer vaccine for his ill dog - Moneycontrol.com — Australian techie uses ChatGPT, AlphaFold to create personalised cancer vaccine for his ill dog Moneycontrol.com.
From the Industry
- UK biotech Ternary raises £3.6m to scale AI platform for next-generation drugs - Business Matters — UK biotech Ternary raises £3.6m to scale AI platform for next-generation drugs Business Matters.
- Greenwich LifeSciences Switched Drug Product Mid-Pivotal Trial. The FDA Has Rules About That. - The Clinical Trial Vanguard — Greenwich LifeSciences Switched Drug Product Mid-Pivotal Trial.
- Recognizing the Right Time to Start Biologics in HS - Dermatology Times — Recognizing the Right Time to Start Biologics in HS Dermatology Times.
- Voyager Acquisition Corp. Secures Shareholder Approval for VERAXA Biotech Merger - citybiz — Voyager Acquisition Corp.
- Cancer drug developer Theriva inks SYN-020 deal up to $38M - Stock Titan — Cancer drug developer Theriva inks SYN-020 deal up to $38M Stock Titan.
- Theriva™ Biologics Reports Full-Year 2025 Operational Highlights and Financial Results - GlobeNewswire — Theriva™ Biologics Reports Full-Year 2025 Operational Highlights and Financial Results GlobeNewswire.
- Six biotechs to know in Barcelona - Labiotech.eu — Six biotechs to know in Barcelona Labiotech.eu.
Quick Reads
Protein language model-guided generative design of affinity peptides for chromatographic purification of lentiviral vectors.
Lentiviral vectors (LVs) have emerged as the most promising tool for cell and gene therapy. Read more →
De Novo Drug Design, Synthesis, Biological Evaluation, and Structural Examination of Novel Coumarin-Based Pyrimidine Co-Drugs Accompanied by Molecular Docking and DFT Studies.
In this research, we synthesized novel coumarin-pyrimidine hybrid molecules and studied their antioxidant properties by using the DPPH radical scavenging assay. Read more →
Mechanisms of topical Mahoniae Caulis against acute mastitis: Integrating SMR, network pharmacology, molecular docking, and experimental validation.
Mahoniae Caulis (MC) has long been documented in Chinese medical texts for the treatment of “Ru Yong” (acute mastitis). Read more →
Computer-aided structural modeling and drug discovery for G-protein-coupled receptors in the age of artificial intelligence.
G-protein-coupled receptors (GPCRs) are a large family of membrane proteins that mediate cellular responses to diverse stimuli and serve as targets for ∼35 % of Food and Drug Administration-approved drugs. Read more →
Antiproliferative activity, phytochemistry, network pharmacology, molecular docking and gene expression analysis of Maerua edulis extracts against human cervical cancer cell line.
Maerua edulis (Gilg & Gilg-Ben.) DeWolf) was traditionally used in the treatment and/ or management of various diseases, including cancer. Read more →
Uncovering the Potential Mechanisms of Ergothioneine in Neuroinflammation Through Network Pharmacology, Molecular Docking, Molecular Dynamics Simulation, and In Vitro Validation
Vitamin D Modulates Humoral Responses to SARS-CoV-2 Vaccination in Autoimmune Thyroiditis: An Endocrine–Immune Perspective Supported by Network Pharmacology, Molecular Docking, and Molecular Dynamics Simulations
Computational Innovations in Cancer Research and How Computing is Transforming Drug Discovery and Development: A Review.
Cancer is a major global health concern, causing millions of deaths each year due to the uncontrolled growth and spread of abnormal cells. Read more →
Pipeline Tip
Always validate pLDDT scores before using AlphaFold models for docking.
Resources & Tools
- Dataset: Uniprot Knowledgebase - The world’s most comprehensive resource for protein sequence and annotation.
- Dataset: PDB-REDO - Optimized protein structure database with refined models.
- Tool: MAFFT - Multiple sequence alignment with high speed and accuracy. View all tools →
- Tool: Clustal Omega - Scalable multiple sequence alignment for protein families. View all tools →
- Event: Protein Design Hub (LinkedIn Group) (Ongoing)
- Event: Structural Biology Events (Open)
- Job: Omics & Bioinformatics Scientist (Wet Lab + Data Pipelines) - Workable at Workable
The protein structure is the language of life; design is its poetry. — Recep Adiyaman