Issue #42: DeepFold-PLM: accelerating protein structure prediction via efficient homology search using protein language models.

Building something in Protein Design?
I love collaborating on new architectural challenges. Let's build together.
🧬 Protein Design Digest
Curated protein signals by Recep Adiyaman
🚀 Today’s Top Signal
DeepFold-PLM: accelerating protein structure prediction via efficient homology search using protein language models.
🧬 Abstract
Motivation Protein structure prediction has been revolutionized and generalized with the advent of cutting-edge AI methods such as AlphaFold, but reliance on computationally intensive multiple sequence alignments (MSA) remains a major limitation. Results We introduce DeepFold-PLM, a novel framework that integrates advanced protein language models with vector embedding databases to enhance ultra-fast MSA construction, remote homology detection, and protein structure prediction. DeepFold-PLM utilizes high-dimensional embeddings and contrastive learning, significantly accelerate MSA generation, achieving 47 times faster than standard methods, while maintaining prediction accuracy comparable to AlphaFold. In addition, it enhances structure prediction by extending modeling capabilities to multimeric protein complexes, provides a scalable PyTorch-based implementation for efficient large-scale prediction. Our method also effectively increases sequence diversity (Neff = 8.65 versus 4.83 with JackHMMER) enriching coevolutionary information critical for accurate structure prediction. DeepFold-PLM thus represents a versatile and practical resource that enables high-throughput applications in computational structural biology. Availability and implementation Source codes and user-friendly Python API of all modules of DeepFold-PLM publicly available at https://github.com/DeepFoldProtein/DeepFold-PLM.
Why it matters: Critical for improving fold accuracy and reducing structural uncertainty in de novo design.
⭐ Additional Signals
A New Insight into the Study of Neural Cell Adhesion Molecule (NCAM) Polysialylation Inhibition Incorporated the Molecular Docking Models into the NMR Spectroscopy of a Crucial Peptide-Ligand Interaction.
The expression of polysialic acid (polySia) on the neuronal cell adhesion molecule (NCAM) is called NCAM-polysialylation, which is strongly related to the migration and invasion of tumor cells and aggressive clinical status. During the NCAM polysialylation process, polysialyltransferases (polySTs), such as polysialyltransferase IV (ST8SIA4) or polysialyltransferase II (ST8SIA2), can catalyze the addition of CMP-sialic acid (CMP-Sia) to the NCAM to form polysialic acid (polySia). In this study, the docking models of polysialyltransferase IV (ST8Sia4) protein and different ligands were predicted using Alphafold 3 and DiffDock servers, and the prediction accuracy was further verified using the NMR experimental spectra of the interactions between polysialyltransferase domain (PSTD), a crucial peptide domain in ST8Sia4, and a different ligand. This combination strategy provides new insights into a quick and effective screening for inhibitors of tumor cell migration.
Highly accurate protein structure prediction-based virtual docking pipeline accelerating the identification of anti-schistosomal compounds.
Schistosomiasis is a major neglected tropical disease that lacks an effective vaccine and faces increasing challenges from praziquantel resistance, underscoring the urgent need for novel therapeutics. Target-based drug discovery (TBDD) is a powerful strategy for drug development. In this study, we utilized AlphaFold to predict the structures of target proteins from Schistosoma mansoni and S. japonicum, followed by virtual molecular screening to identify potential inhibitors. Among 202 potential therapeutic targets, we identified 37 proteins with high-accuracy structural predictions suitable for molecular docking with 14,600 compounds. This screening yielded 268 candidate compounds, which were further evaluated ex vivo for activity against both adult and juvenile S. mansoni and S. japonicum. Seven compounds exhibited strong anti-schistosomal activity, with HY-B2171A (Carubicin hydrochloride, CH) emerging as the most potent. CH was predicted to target the splicing factor U2AF65, and knockdown of its coding gene Smp_019690 resulted in a phenotype similar to CH treatment. RNA sequencing revealed that both CH treatment and Smp_019690 RNA interference (RNAi) disrupted splicing events in the parasites. Further studies demonstrated that CH impairs parasite viability by inhibiting U2AF65 function in mRNA splicing regulation. By integrating RNAi-based target identification with structure-based virtual screening, alongside ex vivo phenotypic and molecular analyses of compound-treated schistosomes, our study provides a comprehensive framework for anti-schistosomal drug discovery and identifies promising candidates for further preclinical development.
Protein Structural Model Selection Informed by Comparison of Predicted Ligand Binding Poses.
Recent advances in protein structure prediction have highlighted the importance of a longstanding problem: given multiple structural models of a protein, how does one select the best model to use when predicting interactions between that protein and candidate drug molecules? Here we demonstrate the value of a previously unutilized source of information in addressing this problem. We show that given multiple ligands known to bind the protein, one can perform effective model selection by comparing the predicted binding poses of multiple ligands at each model. We introduce a method, RevBind, that exploits this information, leveraging the statistical tendency of different ligands to form similar chemical interactions with a protein’s binding pocket. RevBind can be used, for example, to select among variants of AlphaFold models, identifying those that are most useful for molecular docking. Our findings pave the way for the development of even better model selection methods that draw simultaneously on the information used by RevBind and the information used by previous methods.
🧪 AI & Research News
- ConvGeM-Next: A deep learning framework for plant disease detection - Frontiers: ConvGeM-Next: A deep learning framework for plant disease detection Frontiers
🏢 Industry Insight & Applications
- Indian Gov’t To Invest $1.1 Bn To Support Biologics & Biosimilars Hub - DCAT Value Chain Insights: Indian Gov’t To Invest $1.1 Bn To Support Biologics & Biosimilars Hub DCAT Value Chain Insights
- A cut above: Veradermics locks in $256M IPO and shares spike - Fierce Pharma: A cut above: Veradermics locks in $256M IPO and shares spike Fierce Pharma
- Eisai strikes Japan licensing deal with Shanghai Henlius Biotech - WKZO: Eisai strikes Japan licensing deal with Shanghai Henlius Biotech WKZO
- Eisai strikes Japan licensing deal with Shanghai Henlius Biotech - Reuters: Eisai strikes Japan licensing deal with Shanghai Henlius Biotech Reuters
- Eikon Therapeutics nets a $381M IPO amid burst of biotech offerings - BioPharma Dive: Eikon Therapeutics nets a $381M IPO amid burst of biotech offerings BioPharma Dive
- From Biologics to Bio-Machines: Top Takeaways From Maui Derm 2026 - American Journal of Managed Care: From Biologics to Bio-Machines: Top Takeaways From Maui Derm 2026 American Journal of Managed Care
- Trial tests VCN-01 before eye removal in hard-to-treat retinoblastoma - stocktitan.net: Trial tests VCN-01 before eye removal in hard-to-treat retinoblastoma stocktitan.net
⚡ Quick Reads
Innovative Approaches in Molecular Docking for the Discovery of Novel Inhibitors Against Alzheimer’s Disease.
Introduction Alzheimer’s disease (AD) is a debilitating neurodegenerative condition marked by progressive cognitive decline and memory impairment, affecting millions worldwide. Despite extensive research, no definitive cure exists, underscoring the need for innovative approaches to drug discovery and development. Methods This review focuses on the application of molecular docking techniques in the context of AD drug discovery. The methodology involves the use of computational modeling tools to predict and analyze the interactions between small drug-like molecules and key protein targets implicated in AD pathogenesis, particularly amyloid-beta (Aβ) and tau proteins. Results Molecular docking has enabled the virtual screening of large chemical libraries to identify potential inhibitors of Aβ aggregation and tau hyperphosphorylation. Numerous studies have validated docking-predicted interactions with in vitro and in vivo experiments, resulting in the discovery of novel compounds with promising pharmacological profiles. Docking has also aided in the optimization of ligand binding affinity and selectivity toward AD-relevant targets. Discussion The integration of molecular docking with experimental techniques enhances the reliability and efficiency of the drug discovery process. Docking allows for the early identification of bioactive molecules, reducing time and cost compared to traditional methods. However, limitations such as rigid receptor assumptions and scoring function inaccuracies require further refinement. Conclusion Molecular docking stands out as a powerful computational tool in the quest for effective AD therapies. Simulating protein-ligand interactions accelerates the identification of potential drug candidates and supports the rational design of targeted interventions, paving the way for future clinical applications in combating Alzheimer’s disease.
scDock: Streamlining drug discovery targeting cell-cell communication via scRNA-seq analysis and molecular docking
Summary Identifying drugs that target intercellular communication networks represents a promising therapeutic strategy, yet linking single-cell RNA sequencing (scRNA-seq) analysis to structure-based drug screening remains technically challenging and requires substantial bioinformatics expertise. We present scDock, an integrated and user-friendly pipeline that seamlessly connects scRNA-seq data processing, cell–cell communication inference, and molecular docking-based drug discovery. Through a single configuration file, users can execute the complete workflow, from raw scRNA-seq data to ranked drug candidates, without programming skills. scDock automates the identification of disease-relevant ligand–receptor interactions from scRNA-seq data and perfoms structure-based virtual screening against these communication targets using Protein Data Bank (PDB) or AlphaFold-predicted protein structures. The pipeline generates comprehensive outputs at each stage, enabling users to explore intercellular signaling alterations and discover therapeutic compounds targeting specific cell–cell communications. scDock addresses a critical gap by providing an accessible end-to-end solution for communication-targeted drug discovery from single-cell data. Availability and Implementation scDock is freely available at https://github.com/Andrewneteye4343/scDock . It is implemented in R, Python, shell scripts, and supports Linux systems, including Ubuntu and Debian.
Systematic evaluation of computational tools to predict the effects of mutations on protein-ligand binding affinity in the absence of experimental structures.
Drug resistance caused by mutations is a significant global health concern. One way to better understand this phenomenon is by studying changes in protein-ligand binding affinity upon mutation. While recent advances in protein modelling, such as AlphaFold2 and AlphaFold3, have transformed structural assessments, their utility in predicting mutation-induced binding affinity changes remains underexplored. We evaluated various mutation-based methods and scoring functions using computer-generated protein-ligand complexes. Compared to a baseline using experimental structures, we observed a performance drop ranging from 5% to 30% across different computational models. Specifically, using experimental receptors with docked ligands resulted in a ~5% drop, similar to that observed with AlphaFold3 models (~5%), despite the latter offering lower ligand root mean square deviation. However, using AlphaFold2 receptors with docking led to a greater performance loss (10%-20%), comparable to homology models with high sequence identity. Homology models based on low-identity templates showed over 30% decline. These performance differences were most pronounced for interface mutations and low molecular weight ligands. While AlphaFold models offer accurate protein and interaction predictions, they lack mutation-specific information, such as dynamic changes, highlighting the need for complementary mutation-aware methods for reliable analysis. Our findings provide insights into interpreting mutation effects on ligand binding using predicted structures and can guide more robust assessments of drug resistance mechanisms in silico.
Exploring the Mechanism of Action of Chicoric Acid Against Influenza Virus Infection Based on Network Pharmacology, Molecular Docking, and Molecular Dynamics Simulation.
This study theoretically explores the mechanism of action of Chicoric acid against influenza virus based on network pharmacology, molecular docking, and molecular dynamics simulation techniques, aiming to provide insights for the development of new veterinary drugs for influenza. Potential targets for influenza virus action were identified using the PharmMapper (i.e. Version 2017) server and disease databases including GeneCards and OMIM. The STRING online analysis platform and Cytoscape 3.9.1 software were employed to construct a protein-protein interaction (PPI) network of the target proteins, followed by topological analysis to screen for key targets. Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed on the intersecting targets using the DAVID database. A “drug-target-pathway” network diagram was constructed using Cytoscape 3.9.1 software. Molecular docking was carried out with AutoDock 1.5.6 and PyMOL 2.5 software to identify dominant binding targets, followed by molecular dynamics simulation analysis. The results of network analysis showed that there were 31 potential targets of Chicoric acid; the protein interaction network suggested that UBC, UBA52, RPS27A, HCK, and CDKN1B may be the core targets of Chicoric acid; 55 cell biological processes were obtained by GO enrichment analysis, and 15 related signaling pathways were obtained by KEGG pathway enrichment analysis; molecular docking showed that UBC and UBA52 had a good affinity to Chicoric acid and may be the dominant target of Chicoric acid exerting its effect. Chicoric acid may play a role in antiviral activity by acting on the dominant protein of UBC and UBA52, thus achieving an anti-influenza virus effect.
The therapeutic potential of Zuogui Wan in oligoasthenozoospermia: insights from network pharmacology, molecular docking, molecular dynamics simulation, and experimental validation.
Oligoasthenozoospermia (OAS) is a major cause of male infertility, with limited effective treatments. Chinese patent medicine Zuogui Wan (ZGW) has been traditionally used to improve sperm quality, but its molecular mechanisms remain unclear. This study integrates network pharmacology, molecular docking, molecular dynamics (MD) simulation, and in vivo and in vitro experiments to explore ZGW’s therapeutic effects in OAS. Active compounds and targets of ZGW were identified using network pharmacology, and intersecting OAS-related targets underwent enrichment and protein-protein interaction (PPI) analysis. Molecular docking and MD simulations assessed compound-target binding affinity and stability. In vitro, CCK-8 assays measured cell proliferation, while qPCR and Western blot analyzed key gene and protein expression. In vivo, a rat OAS model was used to evaluate ZGW’s therapeutic effects through transmission electron microscopy (TEM), hematoxylin & eosin (HE) staining, and TUNEL assays. The expression of key molecular targets was further validated by qPCR and Western blot. A total of 182 potential targets were identified, with TP53, NF-κB1, and PKC as key hub genes. KEGG pathway analysis highlighted the involvement of the PI3K-AKT and MAPK signaling pathways.Four core bioactive compounds-Cyasterone, Betavulgarin, Kaempferol, and Quercetin-were identified, with Cyasterone exhibiting the strongest binding affinity and highest stability.In vitro experiments demonstrated that ZGW significantly promoted cell proliferation and regulated apoptosis-related gene expression, indicating its potential in enhancing sperm function. In vivo, ZGW improved testicular structure, enhanced sperm quality, and reduced spermatogenic cell apoptosis, as evidenced by TEM, HE, and TUNEL assays. Molecular validation further confirmed ZGW’s modulation of key signaling pathways involved in OAS. ZGW modulates apoptosis, oxidative stress, and key pathways (PI3K-AKT, MAPK) while regulating TP53, NF-κB, and PKC expression. Cyasterone exhibits strong binding and stability with core targets. This study supports ZGW as a potential treatment for male infertility.
Protein Engineering and Drug Discovery: Importance, Methodologies, Challenges, and Prospects.
Protein engineering is a rapidly evolving field that plays a critical role in transforming drug discovery and development. This innovative field harnesses the unique structural and functional properties of engineered proteins, such as monoclonal antibodies, nanobodies, therapeutic enzymes, and cytokines, to address complex diseases more effectively than traditional small-molecule drugs. These biologics not only enhance therapeutic specificity but also minimize adverse effects, marking a significant advancement in patient care. However, the journey of protein engineering is not without challenges. Issues related to protein folding, stability, and potential immunogenicity pose significant complications. Additionally, navigating the complex regulatory landscape can delay the transition from laboratory to clinical application. Addressing these hurdles requires the integration of cutting-edge technologies, including phage and yeast display technology, CRISPR, and advanced computational modeling, which enhance the predictability and efficiency of protein design. In this review, we explore the multifaceted impact of protein engineering on modern medicine, highlighting its potential to transform treatment paradigms, methodologies, challenges, and the successful development and approval of recombinant protein-based therapies. By navigating the complexities and leveraging technological advancements, the field is poised to unlock new therapeutic possibilities, ultimately improving patient outcomes and transforming healthcare.
Demonstrating the Absence of Correlation Between Molecular Docking and in vitro Cytotoxicity in Anti-Breast Cancer Research: Root Causes and Practical Resolutions.
Introduction In silico methods have significantly transformed the landscape of drug discovery by enabling rapid and cost-effective screening of prospective therapeutic compounds. However, these computational techniques remain limited in their ability to fully predict complex biological behavior, particularly within the constraints of quantum level interactions and simplified receptor-ligand models. As such, validation through experimental data remains critical. Purpose This review aims to critically evaluate the correlation between molecular docking predictions specifically Gibbs free energy (ΔG) and in vitro cytotoxicity data (IC 50 values) obtained from MCF-7 breast cancer cell studies. Methodology A structured methodology was employed, applying predefined inclusion and exclusion criteria to identify studies reporting both in silico molecular docking results and in vitro cytotoxicity data on the MCF-7 cell line, with a focus on compounds targeting breast cancer-related proteins. Results Findings demonstrated that, contrary to theoretical expectations, no consistent linear correlation was observed between ΔG values and IC 50 across the analyzed compounds and targets. This discrepancy arises from several intertwined factors, including variability in protein expression within cell-based systems, compound-specific characteristics such as permeability and metabolic stability, and methodological limitations of docking approaches that rely on rigid receptor conformations and simplified scoring functions. In addition, the chemical diversity of the evaluated compounds further contributes to the inconsistency of cytotoxic outcomes. Nevertheless, when experimental and computational systems are uniformly controlled, a measurable and meaningful correlation between ΔG and IC 50 can be demonstrated. Conclusion This review underscores the need to move beyond single parameter docking predictions and adopt integrated strategies that combine computational models with empirical validations. Future studies should emphasize the use of standardized in vitro conditions, rational target selection, and complementary techniques such as molecular dynamics simulations, intracellular exposure assessment, and target engagement validation. These integrative approaches will enhance the predictive power of in silico methods and foster a more reliable foundation for anti-breast cancer drug development.
From MM-PBSA to H-MMGB: Multiscale Modeling for Biomolecular Structure and Drug Discovery.
From early efforts to predict protein structure from simplified models, computational biophysics has progressed toward increasingly physics-based approaches for evaluating biomolecular structure, molecular interactions, and energetics. The molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method provided one of the first broadly accessible ways to evaluate binding and folding energetics from molecular dynamics (MD) trajectories, with applications ranging from protein structure prediction benchmarks to protein-ligand affinity ranking. Building on this foundation, the hierarchical Molecular Mechanics Generalized Born (H-MMGB) approach was developed to provide MMGB-based binding free energy estimates more efficiently, employing the Generalized Born model in contrast to the Poisson-Boltzmann framework of MM-PBSA and thereby enabling prospective applications to ligand design. Case studies illustrate how these methods, ranging from protein folding assessment to intact-ligand modeling and to a deconstruction-reconstruction strategy using picofragments, enable hypothesis generation in the absence of experimental structures and in challenging protein-protein interaction targets. Together, these developments support a guiding principle: gradual incorporation of more physics into modeling workflows increases the probability of successfully meeting objectives across diverse computational simulation problems.
💡 Pipeline Tip
Normalise thermal B-factors when comparing different crystal structures.
🛠️ Resources
- Dataset: SCOPe - Curated structural classification of proteins for fold analysis.
- Dataset: Pfam - Protein families database with curated multiple sequence alignments.
- Tool: ProteinSolver - Graph-based neural network for protein sequence design. View all tools →
- Tool: RFdiffusion - State-of-the-art generative model for de novo protein design. View all tools →
- Event: Structural Biology Events (Open)
- Event: Protein Design Hub (LinkedIn Group) (Ongoing)
- Job: Research Fellow - Bioinformatics – Scientist at UCL - Jobs.ac.uk at Jobs.ac.uk
- Job: Computational Biology Data Analyst - (Grade 6) at University of Liverpool - Jobs.ac.uk at Jobs.ac.uk
Deep learning is not a magic wand, but a powerful lens for structural biology. — Recep Adiyaman