Skip to main content
recep.adiyaman
Daily Signal May 01, 2026 · 10 min read

Issue #99: The past, present and future of de novo protein design.

Protein Design Digest #99: The past, present and future of de novo protein design.

Share X LinkedIn
Protein Design Daily

Building something in Protein Design?

I love collaborating on new challenges. Let's build together.

Subscribe to Protein Design Digest

Daily curated signals from arXiv, PubMed, and BioRxiv.

Signal of the Day

The past, present and future of de novo protein design.

With deep-learning-powered advances in protein design methods, there is an ongoing paradigm shift in protein engineering from random selection to intentional computational design methods. Here we describe the current state of de novo protein design. While there is still room for improvement in success rates and activities, the long-standing challenges of designing new protein structures, assemblies and protein binders are close to being solved. The key current questions in these areas are not how to design, but what to design, and open-source design methodology such as RFdiffusion and ProteinMPNN together with protein structure prediction tools enable biochemists and molecular biologists to broadly explore possible applications. There has also been considerable progress in the de novo design of small-molecule target binders, enzymes and multistate protein systems. Current challenges for methods development include design of catalysts for reactions with high energy barriers and, more generally, design of switches and nanomachines that integrate binding, conformational change and catalysis. Over the next five to ten years, we anticipate the design of sophisticated protein nanomachines and materials with functionality ranging far beyond that generated during natural evolution for a wide range of applications in medicine, technology and sustainability.

Why this matters: Critical for improving fold accuracy and reducing structural uncertainty in de novo design.


Also Worth Reading

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and activity endpoints, including public ADMET and Tox21 benchmarks and two internal anti-infective activity datasets. Across 167,056 held-out task–molecule evaluations under structure-similarity-separated five-fold cross-validation (37,756 ADMET, 77,946 Tox21, 49,266 anti-TB and 2,088 antimalaria), classical machine-learning (ML) models such as RF(ECFP4) and ExtraTrees(RDKit descriptors) win ten primary-metric tasks, GNNs such as GIN and Ligandformer win nine, and pretrained molecular sequence models such as MoLFormer and ChemBERTa2 win three. Rule-based SAR reasoning baselines, represented by GPT5.5-SAR and Opus4.7-SAR, do not win under the prespecified primary metrics, although train-fold-derived SAR knowledge provides measurable but uneven gains for SAR reasoning and interpretation. These results indicate that compact, specialized models remain highly effective for molecular property and activity prediction. The performance differences among classical ML, GNN and pretrained sequence models are often modest and endpoint-dependent, whereas larger or more general models do not provide a universal predictive advantage. Large models may still add value for zero-shot reasoning, SAR interpretation and hypothesis generation, but the results suggest that predictive performance depends on the alignment among molecular representation, inductive bias, data regime, endpoint biology and validation protocol.

Identification of paucinervin D as a natural sphingosine-1-phosphate receptor 1 agonist: Insights from pharmacophore modeling, docking, molecular dynamics simulations, and density functional theory.

Sphingosine-1-phosphate receptor 1 (S1PR1), a member of the G protein-coupled receptor (GPCR) family, is a crucial therapeutic target for various diseases. Activation of S1PR1 has been recognized as an effective therapeutic strategy for multiple sclerosis (MS), inflammatory bowel disease (IBD), and psoriasis. Natural products (NPs) serve as a rich source of bioactive compounds for drug discovery. Here, we aimed to discover novel S1PR1 agonists from NPs via multi-level virtual screening (VS). Using a validated HipHop pharmacophore model, we screened a database containing 54,642 NPs, followed by molecular docking. Based on binding mode analysis, four candidate S1PR1 agonists (NPC323626, NPC264112, NPC469907, and NPC22192) were selected. Subsequent molecular dynamics (MD) simulations and binding free energy calculations confirmed the stability of the receptor-ligand complexes and their binding affinities. Among the four candidates, NPC469907 exhibited the strongest binding affinity for S1PR1, with a value of -58.08 ± 0.13 kJ/mol. Furthermore, hydrogen bonds formed between NPC469907 and Glu121 of S1PR1 were found to be essential for receptor activation. Quantum mechanical calculations further revealed that the phenyl-ring-attached hydrogen site in NPC469907 could be modified without compromising its ability to activate S1PR1. The analysis of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) indicated that NPC469907 possessed favorable pharmacokinetic properties and low toxicity. In conclusion, our study identified NPC469907 as a promising natural S1PR1 agonist and established an effective VS strategy for the discovery of novel S1PR1 agonists.

Molecular docking approaches in mycetoma: Toward improved patient management.

Mycetoma is a neglected tropical disease characterised by chronic, granulomatous inflammation of the subcutaneous tissues, often leading to disfigurement, disability, and significant socioeconomic burdens. Caused by a diverse array of bacterial and fungal pathogens, eumycetoma is predominantly driven by Madurella mycetomatis, and current treatment strategies are limited and often ineffective. Conventional antifungal therapies, such as itraconazole, require prolonged administration, frequently combined with surgical interventions, yet cure rates remain suboptimal, and recurrence is common. The formidable protective grain, comprising microbial material, melanin, and host-derived substances, acts as a physical and biochemical barrier, impeding the penetration and efficacy of drugs. Additionally, issues such as toxicity, resistance, and high costs further complicate management, underscoring the urgent need for novel therapeutic strategies. Recent advancements in computational drug discovery, particularly molecular docking, offer promising avenues to accelerate the identification of effective anti-mycetoma agents. Molecular docking simulates the interaction between small molecules and target proteins, enabling rapid virtual screening of large compound libraries, including natural products, existing drugs, and synthetic molecules, against key pathogenic targets. This structure-based approach helps prioritise candidates with high binding affinity, guiding subsequent experimental validation and reducing both time and financial costs associated with traditional drug development. When integrated with artificial intelligence (AI) and machine learning (ML), these methods can enhance predictive accuracy, uncover novel bioactive scaffolds, and facilitate the repurposing of FDA-approved drugs such as montelukast and vilanterol. Key molecular targets in M. mycetomatis include enzymes and pathways critical for pathogen survival and virulence, notably cytochrome P450 (CYP51), dihydrofolate reductase (DHFR), chitin synthase, melanin biosynthesis pathways, and metal ion acquisition systems. Melanin production, via DHN-melanin, DOPA-melanin, and pyomelanin pathways, contributes to grain pigmentation and structural integrity, while metal ions such as iron and zinc are vital for enzymatic activities, grain formation, and fungal virulence. Disrupting metal ion homeostasis through targeting zincophores, siderophores, and zinc-binding proteins represents a promising therapeutic strategy to weaken grain robustness and enhance drug penetration. Despite the potential of molecular docking, limitations such as reliance on homology models, static protein structures, and the absence of cellular context necessitate complementary approaches, including molecular dynamics simulations and in vitro validation. These combined efforts can refine candidate compounds, optimise binding affinities, and predict pharmacokinetic properties. Furthermore, integrating docking results with clinical data and global collaboration platforms can accelerate the discovery of affordable, effective treatments tailored to endemic regions. In conclusion, leveraging molecular docking and computational methods to target essential M. mycetomatis pathways offers a promising frontier in mycetoma research. By identifying novel inhibitors and understanding pathogen biology at a molecular level, these approaches can inform targeted therapies, reduce treatment durations, and improve patient outcomes. Future research should focus on validating computational predictions experimentally and translating these findings into clinical practice, with an emphasis on accessible, cost-effective interventions for vulnerable populations affected by this neglected disease.


Research & AI Updates

From the Industry


Quick Reads

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

We present tensor and sequence parallelism (TSP), a parallel execution strategy that folds tensor parallelism and sequence parallelism onto a single device axis. Read more →

Protein folding on a 64 qubit trapped-ion hardware via counterdiabatic quantum optimization

We report the largest trapped-ion hardware demonstration of lattice protein-folding optimization to date, using bias-field digitized counterdiabatic quantum optimization (BF-DCQO) on a fully connected 64-qubit Barium development system similar to the forthcoming IonQ Tempo line. Read more →

High-Dimensional Noise to Low-Dimensional Manifolds: A Manifold-Space Diffusion Framework for Degraded Hyperspectral Image Classification

Recently, Hyperspectral Image (HSI) classification has attracted increasing attention in remote sensing. Read more →

Identification of Glycosyltransferase AcbI in the Last-Step Biosynthesis of Herbicolin A for Improving Antifungal Activity.

Glycosyltransferase AcbI from Pantoea agglomerans ZJU23 was first expressed and biochemically elucidated to catalyze the last-step glycosylation of herbicolin A (1) with antifungal activity. Read more →

Unveiling AKT1 as a key target of β-asarone in Alzheimer’s disease through network pharmacology and molecular dynamics simulations.

Background Alzheimer’s disease (AD) is a progressive neurodegenerative disorder marked by cognitive decline and complex, multi-factorial pathology. Read more →

Atomoxetine attenuates methotrexate-induced lung injury in rats implicating TLR4/NF-κB and Bax/Bcl-2/caspase-3 signaling cascades: a study based on molecular docking and experimental validation.

Methotrexate (MTX) is frequently used to treat a variety of autoimmune diseases and malignancies, but its use is restricted due to a number of side effects, including lung damage. Read more →

Dual-target antioxidant potential of benzimidazole-based thiazine and N-tosylated analogues: computational insights into human 11β-HSD1 and Leishmania GSK-3α inhibition.

Benzimidazole scaffolds are widely recognized as important structural motifs in medicinal chemistry; however, their potential to function as dual-acting agents against both oxidative stress and parasitic infections have not been extensively explored. Read more →

Anti-typhoid-like salmonellosis activity of Phyllanthus amarus Schum. & Thonn. (Phyllanthaceae) leaves extract: UHPLC-ESI-DAD-MS profiling, in vitro and in vivo efficacy assessment, and molecular dynamics simulations targeting SseK3.

Phyllanthus amarus Schum. Read more →

Pipeline Tip

Use Snakemake for reproducible end-to-end protein design workflows.


Resources & Tools

Deep learning is not a magic wand, but a powerful lens for structural biology. — Recep Adiyaman

BS HF DK