Skip to main content
recep.adiyaman
Daily Signal April 29, 2026 · 8 min read

Issue #97: Enhancing CYP450-Ligand Binding Predictions: A Comparative Analysis of Ligand-Based and Hybrid Machine Learning Models.

Protein Design Digest #97: Enhancing CYP450-Ligand Binding Predictions: A Comparative Analysis of L…

Share X LinkedIn
Protein Design Daily

Building something in Protein Design?

I love collaborating on new challenges. Let's build together.

Subscribe to Protein Design Digest

Daily curated signals from arXiv, PubMed, and BioRxiv.

Signal of the Day

Enhancing CYP450-Ligand Binding Predictions: A Comparative Analysis of Ligand-Based and Hybrid Machine Learning Models.

Predicting cytochrome P450 (CYP450) ligand binding is critical in early-stage drug discovery as CYP450-mediated metabolism profoundly influences drug efficacy, safety, and adverse reaction risks. However, experimental determination of CYP450-ligand interactions remains resource- and time-intensive, underscoring the need for robust computational alternatives. While ligand-based methods are commonly employed, they often fail to fully account for structural intricacies governing protein-ligand interactions. To address this gap, we developed a hybrid machine learning framework integrating ligand descriptors, protein descriptors, and protein-ligand interaction descriptors that include molecular docking-derived parameters, rescoring function components from multiple algorithms, and structural interaction fingerprints (SIFt). Evaluated on CYP1A2 and CYP17A1 isoforms, our model demonstrated superior predictive accuracy in cross-validation compared with stand-alone molecular docking and ligand-based approaches. Furthermore, benchmarking against state-of-the-art tools (SwissADME and ADMETlab 3.0) revealed enhanced performance in binding prediction. This work establishes a versatile framework for advancing computational tools to prioritize CYP450 binding assessments during drug discovery.

Why this matters: Essential ground-truth data for validating next-gen foundation models like Boltz or Chai.


Also Worth Reading

Identification of paucinervin D as a natural sphingosine-1-phosphate receptor 1 agonist: Insights from pharmacophore modeling, docking, molecular dynamics simulations, and density functional theory.

Sphingosine-1-phosphate receptor 1 (S1PR1), a member of the G protein-coupled receptor (GPCR) family, is a crucial therapeutic target for various diseases. Activation of S1PR1 has been recognized as an effective therapeutic strategy for multiple sclerosis (MS), inflammatory bowel disease (IBD), and psoriasis. Natural products (NPs) serve as a rich source of bioactive compounds for drug discovery. Here, we aimed to discover novel S1PR1 agonists from NPs via multi-level virtual screening (VS). Using a validated HipHop pharmacophore model, we screened a database containing 54,642 NPs, followed by molecular docking. Based on binding mode analysis, four candidate S1PR1 agonists (NPC323626, NPC264112, NPC469907, and NPC22192) were selected. Subsequent molecular dynamics (MD) simulations and binding free energy calculations confirmed the stability of the receptor-ligand complexes and their binding affinities. Among the four candidates, NPC469907 exhibited the strongest binding affinity for S1PR1, with a value of -58.08 ± 0.13 kJ/mol. Furthermore, hydrogen bonds formed between NPC469907 and Glu121 of S1PR1 were found to be essential for receptor activation. Quantum mechanical calculations further revealed that the phenyl-ring-attached hydrogen site in NPC469907 could be modified without compromising its ability to activate S1PR1. The analysis of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) indicated that NPC469907 possessed favorable pharmacokinetic properties and low toxicity. In conclusion, our study identified NPC469907 as a promising natural S1PR1 agonist and established an effective VS strategy for the discovery of novel S1PR1 agonists.

MsgaBpred: A B-cell epitope predictor integrating AlphaFold3-predicted structures with multi-scale GCNs and pre-trained language model ESM-C.

Accurate prediction of B-cell epitopes plays a key role in facilitating advancements in vaccines, therapeutics, and diagnostics. In contrast to labor-intensive experimental approaches, computational strategies provide a more economical and efficient means of identifying potential epitopes. Existing methods are often limited by their reliance on experimentally resolved protein structures or by the use of lower-accuracy predicted structures. Sequence-based approaches, while fast, largely fail to capture the 3D spatial context essential for conformational epitopes. With the breakthroughs achieved by AlphaFold3 in predicting protein structures, we present MsgaBpred, the model to apply AlphaFold3-derived structures to B-cell epitope identification. Given only a protein sequence, our model employs a multi-scale graph convolutional network and additive attention to capture complex structural dependencies without relying on experimentally determined structures. The multi-scale design allows for effective modeling of both local and global contexts by aggregating information across different neighborhood ranges. Additionally, we leverage ESM-C, a more expressive protein language model than ESM-2, to enhance feature representation for B-cell epitope prediction. Extensive evaluations across multiple benchmark datasets demonstrate that MsgaBpred achieves competitive and robust performance; notably, it yields a statistically significant improvement in AUC compared to existing state-of-the-art methods. Moreover, the modular and scalable architecture of MsgaBpred holds promise for broader applications, including the structural analysis of other biomolecular entities such as nucleic acids and carbohydrates.

FoldDelay web server: an online tool to quantify translation-driven delays in protein native contact formation.

Co-translational protein folding is shaped by the vectorial nature of translation, which causes residues to emerge sequentially from the ribosome. As a result, residues whose native interaction partners lie downstream in sequence cannot immediately form their native contacts and remain transiently unsatisfied until those partners are synthesized. These unsatisfied residues are vulnerable to non-native interactions and often require the engagement of co-translational chaperones. We previously developed the Native Fold Delay (NFD) metric to quantify the time lag between the synthesis of a residue and the point at which it can form all its native contacts. Here, we present the FoldDelay web server, a freely accessible platform that extends the NFD concept into a more comprehensive framework for analyzing native residue-residue contact formation during translation. Starting from user-submitted AlphaFold or PDB structures, the site identifies all N- to C-terminal residue-residue contacts, estimates their earliest possible formation times, and integrates domain annotations to distinguish between intra- and inter-domain contacts. The server provides a suite of linked interactive visualizations that allows users to explore native contact formation dynamics and detect transiently unsatisfied regions. The FoldDelay web server is freely accessible at https://folddelay.switchlab.org.


Research & AI Updates

From the Industry


Quick Reads

Exploring the mechanisms of luteolin in treating polycystic ovary syndrome and endometriosis via network pharmacology, molecular docking, and molecular dynamics simulation.

This study aims to elucidate the molecular mechanisms underlying luteolin’s therapeutic effects on polycystic ovary syndrome (PCOS) and endometriosis (EM), thereby providing a theoretical foundation for developing novel treatment strategies. Read more →

ICFinder: ion channel identification and ion permeation residue prediction using protein language models

Ion channel dysfunction underlies many diseases (e.g., arrhythmias, epilepsy, cystic fibrosis), and uncharacterized channels may also contribute to pathology. Read more →

Development of potential CDK9 inhibitors through pharmacophore-based virtual screening, 3D-QSAR, molecular docking, MD simulation, and in vitro anticancer evaluation.

Cyclin-dependent kinase 9 (CDK9) is a transcription-regulating serine/threonine kinase, and its dysregulation drives tumour initiation, thereby establishing CDK9 inhibition as a mechanistically validated and therapeutically attractive strategy for treating diverse malignancies. Read more →

Phytochemical-mediated green synthesis of selenium nanoparticles using Catharanthus roseus and their physicochemical characterization, biological evaluation, and molecular docking analysis.

This study reports the green biosynthesis of bioactive selenium nanoparticles using Catharanthus roseus extract as a reducing and stabilizing agent. Read more →

Synthesis, spectroscopic, characterization, antimicrobial, DNA interaction, DFT and molecular docking studies of a new Cu(II)-Schiff base complex.

Targeting the noncatalytic activity of GSK3β modulates neuronal excitability in medium spiny neurons via Nav1.6 interactions.

Kinases phosphorylate ion channels, but their noncatalytic roles via protein-protein interactions (PPI) are less understood. Read more →

Mechanistic Mutational Scanning to Uncover the Secret Life of Proteins.

Deep mutational scanning (DMS) has emerged as a transformative tool for dissecting individual protein function and broader cell biology. Read more →

Decoding the catalytic potential of a Cycloclasticus zancles ring-hydroxylating dioxygenase through computational analysis for enhanced PAH biodegradation.

Polycyclic aromatic hydrocarbons (PAHs) are widespread, toxic, and recalcitrant pollutants in marine ecosystems. Read more →

Pipeline Tip

Verify FASTA headers for special characters that break Rosetta pipelines.


Resources & Tools

Deep learning is not a magic wand, but a powerful lens for structural biology. — Recep Adiyaman

BS HF DK