Issue #15: SMARTDock: A Toolkit for the Automated Development of Target-Specific Scoring Functions Using Bioactivity Data.

Building something in Protein Design?
I love collaborating on new architectural challenges. Let's build together.
🧬 Protein Design Digest
Curated protein signals by Recep Adiyaman
🚀 Today’s Top Signal
SMARTDock: A Toolkit for the Automated Development of Target-Specific Scoring Functions Using Bioactivity Data.
🧬 Abstract
Molecular docking has become an essential tool in the early stages of structure-based drug discovery, enabling rapid virtual screening of large compound libraries against biological targets. However, the accuracy of binder selection is often limited by the available scoring functions. Here, we present a novel workflow SMARTDock (Scoring with Machine learning and Activity for Ranking Targeted Docking) that enhances the virtual screening capabilities of GOLD docking by integrating publicly available bioactivity data, a protein-ligand interaction fingerprint (PADIF), and machine learning classification models within a user-friendly Docker environment. This platform-independent approach enables seamless use on different operating systems and is accessible to both computational and medicinal chemists. With only a ChEMBL target ID, a protein structure file, and a SMILES list of testing compounds, users can build and apply target-specific scoring models to improve the enrichment of active compounds in the top ranks. SMARTDock implements the PADIF-based ML methodology to assist in virtual screening. Previous validation of this underlying methodology demonstrated its capacity to enhance screening performance across multiple targets. Finally, we show the advantages and disadvantages in the bioactive classification in virtual screening tasks.
Why it matters:
⭐ Additional Signals
AlphaFold for Docking Screens.
AlphaFold is an AI system developed by Google DeepMind to generate three-dimensional structures of proteins without experimental data. The models created with AlphaFold are available on the AlphaFold Protein Structure Database (AlphaFoldDB) ( https://alphafold.ebi.ac.uk/ ). The AlphaFold database is searchable by sequence and protein identification. This chapter focuses on an AlphaFold model and its use for docking screens using Molegro Virtual Docker. We rely on Jupyter Notebooks to integrate docking simulations and build regression models based on the atomic coordinates of protein-pose complexes. Our study focuses on constructing a neural network regression model to predict the inhibition of cyclin-dependent kinase 19 (CDK19). This enzyme is a target for anticancer drugs and does not have experimental data for its atomic coordinates. We utilize the Molegro Data Modeller to construct a regression model based on docking results of inhibitors for which binding affinity data is available. All CDK19 datasets and Jupyter Notebooks discussed in this work are available at GitHub: https://github.com/azevedolab/docking#readme .
Geometric deep learning assists protein engineering. Opportunities and Challenges.
Protein engineering is experiencing a paradigmatic transformation through the integration of geometric deep learning (GDL) into computational design workflows. While traditional approaches such as rational design and directed evolution have achieved significant progress, they remain constrained by the vastness of sequence space and the cost of experimental validation. GDL overcomes these limitations by operating on non-Euclidean domains and by capturing the spatial, topological, and physicochemical features that govern protein function. This perspective provides a comprehensive and critical overview of GDL applications in stability prediction, functional annotation, molecular interaction modeling, and de novo protein design. It consolidates methodological principles, architectural diversity, and performance trends across representative studies, emphasizing how GDL enhances interpretability and generalization in protein science. Aimed at both computational method developers and experimental protein engineers, the review bridges algorithmic concepts with practical design considerations, offering guidance on data representation, model selection, and evaluation strategies. By integrating explainable artificial intelligence and structure-based validation within a unified conceptual framework, this work highlights how GDL can serve as a foundation for transparent, interpretable, and autonomous protein design. As GDL converges with generative modeling, molecular simulation, and high-throughput experimentation, it is poised to become a cornerstone technology for next-generation protein engineering and synthetic biology.
Modeling Protein-Protein Complexes by Combining pyDock and AlphaFold.
The lack of experimental structures for the majority of protein-protein complexes has motivated the development of a variety of strategies for the structural modeling of protein complexes, such as computational docking, in active development for the last decades, and the more recent artificial intelligence (AI)-based ground-breaking methodologies. Among the existing computational docking methods, Python docking (pyDock) has shown competitive predictive rates and high robustness over the years. However, the field has dramatically changed with the appearance of artificial intelligence (AI)-based methods, like AlphaFold. While structure prediction of individual proteins is virtually solved by this program, the focus is now on how to improve the prediction of challenging cases like antibody-antigen complexes, multiprotein complexes, weak interactions, or highly flexible interacting proteins. Successful strategies are based on the generation of more diverse sets of models and the integration with other “classical” approaches that facilitate the identification of the correct models. Here, we will show in practical terms how to combine the structural modeling capabilities of AlphaFold with the energy-based scoring function in pyDock to improve structural predictions in challenging protein-protein complexes.
🧪 AI & Research News
- Monte Rosa Therapeutics to Present Interim MRT-8102 Phase 1 Study Results - The Manila Times: Monte Rosa Therapeutics to Present Interim MRT-8102 Phase 1 Study Results The Manila Times
🏢 Industry Insight & Applications
- Evidence Supports Safe, Effective Switching to Etanercept Biosimilars - Center for Biosimilars: Evidence Supports Safe, Effective Switching to Etanercept Biosimilars Center for Biosimilars
- ProBioGen and Zag Bio™ Forge Strategic CMC Partnership to Advance Fc-Fusion Autoimmune Therapy - Biotech Newswire: ProBioGen and Zag Bio™ Forge Strategic CMC Partnership to Advance Fc-Fusion Autoimmune Therapy Biotech Newswire
- The new science and business of oral biologics - The Pharma Letter: The new science and business of oral biologics The Pharma Letter
- Our Human-Centric Approach to Partnership Amidst an Evolving Biotech Landscape - Sanofi: Our Human-Centric Approach to Partnership Amidst an Evolving Biotech Landscape Sanofi
- French biotech TheraVectys weighs Hong Kong IPO - Bloomberg - Investing.com: French biotech TheraVectys weighs Hong Kong IPO - Bloomberg Investing.com
- Piper Sandler: Biotech Funding Seems To Be Recovering (NYSE:PIPR) - Seeking Alpha: Piper Sandler: Biotech Funding Seems To Be Recovering (NYSE:PIPR) Seeking Alpha
- Aktis aims for $209M windfall from 1st biotech IPO of 2026 - Fierce Biotech: Aktis aims for $209M windfall from 1st biotech IPO of 2026 Fierce Biotech
⚡ Quick Reads
Exploring the Anti-Inflammatory Molecular Mechanism of Gentiana szechenyii Kanitz. Based on UPLC-MS/MS Combined With Network Pharmacology, Molecular Docking, and Molecular Dynamics Simulation.
This study explored the anti-inflammatory mechanisms of Gentiana szechenyii Kanitz. (GS), a Tibetan medicinal herb, by combining UPLC-MS/MS, network pharmacology, molecular docking, and molecular dynamics (MD) simulation. Using the lipopolysaccharide (LPS)-induced RAW264.7 cell inflammation model, the anti-inflammatory effect of GS was confirmed by detecting the release amount of nitric oxide (NO) and the levels of inflammatory factors tumor necrosis factor (TNF) and interleukin-6 (IL-6). UPLC-MS/MS identified 40 constituents, whereas network analysis predicted 5 core compounds (isovitexin 4’,7-diglucoside, loganin, isoorientin-2″-O-glucoside, gentiopicroside, sweroside), 5 key targets (TNF, IL-6, GAPDH, epidermal growth factor receptor [EGFR], HSP90AA1), and three critical pathways (PI3K-Akt, hypoxia inducible factor-1 [HIF-1], IL-17). Molecular docking showed strong binding between core compounds and targets; the binding energies were all lower than -5 kcal mol -1 , among which isovitexin 4’,7-diglucoside had the lowest binding energy to EGFR (-9.4 kcal mol -1 ). MD simulation confirmed stable binding of TNF with the five core compounds. This study comprehensively clarifies the pharmacodynamic material basis and mechanism of action of GS in anti-inflammation, providing an experimental basis for further development and utilization. It is expected to be applied to the adjuvant treatment of inflammation-related diseases such as chronic bronchitis and pharyngitis in the future, thereby promoting the modernization of Tibetan medicine.
Targeting spermidine synthase in <i>Leishmania donovani</i>: molecular docking and molecular dynamics simulation-based evaluation of Indian medicinal plant phytochemicals.
Visceral leishmaniasis, caused by Leishmania donovani , remains a critical global health challenge due to limited, toxic, and costly treatment options and rising drug resistance. Targeting spermidine synthase (LdSpdS), an essential enzyme for parasite growth, we explored plant-derived phytochemicals as potential inhibitors. A curated phytochemical-inhibitor library was screened against a homology-modeled LdSpdS structure using molecular docking, identifying anaferine, asparagamine A, and isozeylanone as top candidates with strong binding affinities. Drug-likeness evaluation supported their favorable physicochemical properties. Toxicity profiling revealed asparagamine A as the safest candidate, whereas anaferine and isozeylanone exhibited neurotoxic, immunotoxic, and genotoxic liabilities, emphasizing the need for experimental validation. Molecular dynamics simulations and g-MMPBSA binding energy analyses confirmed the conformational stability and robust interactions of the LdSpdS-ligand complexes. Collectively, these findings highlight anaferine, asparagamine A, and isozeylanone as promising lead molecules for LdSpdS-targeted antileishmanial therapy, providing a foundation for future pharmacological development and in vitro/in vivo evaluation. Graphical abstract
Convolutional neural network-assisted screening of natural product inhibitors against <i>Naja naja</i> venom: insights from molecular docking, molecular dynamics simulations and ADMET profiling.
Snakebite envenomation continues to be a major issue of public health which is mainly the case in tropical areas such as India where Naja naja is the main cause of death and diseases related to snakebite. Traditional antivenoms have certain defects, among which poor effectiveness against local tissue injuries and the variability of snake venom are the most significant. This study investigates the antivenom potential of phytochemicals from Canthium coromandelicum , a traditionally used medicinal plant, through a comprehensive in silico pipeline. Methanolic extracts of leaf were subjected to HRLC-MS profiling, identifying 69 bioactive compounds. A machine learning framework (GraphDTA with GINConvNet) was employed for virtual screening of these phytochemicals against key N. naja venom proteins (1CVO, 1MF4, 1NTN, 2CTX, and 7QHI), predicting binding affinities based on graph-based molecular representations. Top candidates were further evaluated via molecular docking, molecular dynamics (MD) simulations, and density functional theory (DFT) analyses to elucidate their binding stability, conformational dynamics, and electronic reactivity. Key phytocompounds, including 8-C-Galactosylluteolin, Araliasaponin V, Saponin D, Quinic acid, and Quercetin 3,7-dirhamnoside, demonstrated strong binding affinity (docking scores: -5.484 kcal/mol to - 9.777 kcal/mol), stability (RMSD Supplementary information The online version contains supplementary material available at 10.1007/s40203-025-00527-x.
Establishing FDA-approved oncology drugs as GPR176 inhibitor through homology modelling, molecular docking, MMGBSA, DFT, and molecular dynamics simulation.
Unraveling the mechanism of curcumin in coronary slow flow phenomenon through network pharmacology and molecular docking.
The coronary slow flow phenomenon (CSFP) is associated with an increased risk of adverse cardiovascular events, yet standardized treatment is lacking. Curcumin, a natural compound, has shown potential in alleviating angina and improving metabolic risk factors in CSFP, but its underlying molecular mechanisms remain unclear. This study employed an integrated computational strategy. Network pharmacology was used to identify potential targets of curcumin and CSFP from public databases, and common targets were identified. Functional enrichment analysis was performed on the common targets, and a protein-protein interaction network was constructed. Core targets were identified using MCODE and CytoHubba plugins in Cytoscape. Molecular docking evaluated the binding modes and affinities of curcumin with the core targets, while molecular dynamics simulations and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) calculations validated the stability and binding free energies of the complexes. A total of 120 predicted targets of curcumin and 435 CSFP-related targets were identified, yielding 19 common targets. Functional enrichment analysis revealed that curcumin may treat CSFP by modulating inflammatory response, vascular function, cell migration, proliferation, apoptosis, and oxidative stress. These targets were associated with key signaling pathways, including NF-κB, TNF, and HIF-1. Network analysis and topological algorithms identified five core targets: EGFR, ICAM1, NFKB1, PTGS2, and STAT3. Molecular docking results demonstrated that curcumin exhibited excellent binding affinity with all core targets. Molecular dynamics simulations confirmed that the curcumin-core target complexes remained structurally stable during the 100 ns simulation, and MM/GBSA calculations indicated significantly negative binding free energies, suggesting strong binding driving forces. Curcumin may exert therapeutic effects on CSFP through a multi-target mechanism, primarily by interacting with key proteins including EGFR, ICAM1, NFKB1, PTGS2, and STAT3, thereby regulating the NF-κB, TNF, and HIF-1 signaling pathways. This study provides a theoretical foundation for the application of curcumin in CSFP treatment, though further experimental validation is required.
Molecular docking and dynamic simulation of escherichia coli K-12 Elements as a Biosensor for Detecting 2,4,6-Trinitrotoluene (TNT).
Trinitrotoluene (TNT) is widely used in military and industrial fields due to its strong explosive properties and chemical stability. However, its persistence in the environment and harmful effects on living organisms make it important to develop sensitive and selective detection methods. Previous research has identified the Escherichia coli genes yadG and aspC as promising components for TNT biosensors, based on their increased gene expression in response to TNT exposure. Although these findings are promising, it is still unclear whether the proteins produced from these genes directly interact with TNT at the molecular level. This study focuses on analyzing the binding interactions between TNT and the protein products of yadG and aspC using computational methods. Molecular docking showed that TNT binds more strongly to yadG (- 6.81 ± 0.02 kcal/mol) than to aspC (- 6.23 ± 0.00 kcal/mol). Further analysis using molecular dynamics simulations with MM-GBSA calculations confirmed that the yadG-TNT complex is more stable, with a binding free energy (ΔG) of - 23.58 kJ/mol, in line with fluorescence data that also indicated stronger binding to yadG. TNT binding to yadG involves aromatic residues (Tyr-106, His-153) and hydrophobic contacts (Ala-150), which may promote π-π stacking and suggest reduced water occupancy. These features highlight key principles for protein engineering and suggest a clear route from computational findings to biosensor development.
Assessing the validity of leucine zipper constructs predicted by AlphaFold.
AP-1 transcription factors are a network of cellular regulators that combine in different dimer pairs to control a range of pathways involved in differentiation, growth, and cell death. They dimerize via leucine zipper coiled-coil domains that are preceded by a basic DNA binding domain. Depending on which AP-1 transcription factors dimerize, different DNA sequences will be recognized resulting in differential gene expression. The affinity of AP-1 transcription factors for each other dictates which dimers form. The relative concentration of AP-1 transcription factors varies with tissue type and environment, adding another layer of control to this integral network of cellular regulation. The development of artificial intelligence (AI)-based protein structure prediction methods gives us a new technique to investigate or predict how dimerization affects combinatorial control. All versions of AlphaFold2 and AlphaFold3 are AI/deep learning programs that predict 3D structures of proteins from an amino acid sequence and multiple sequence alignments of homologous proteins. To fully realize the potential of AI for structural biology, it is essential to understand its current capabilities and limitations. In this study, we used the classical example of an AP-1 dimer: Fos and Jun, and an array of over 2000 experimentally tested human leucine zippers to interrogate how AlphaFold models leucine zipper domains and if AlphaFold can be used to differentiate between probable and improbable dimer interfaces. We found that AlphaFold predicts highly confident leucine zipper dimers, even for dimer pairs such as the FosB homodimer, for which electrostatics are known to prevent their formation in vivo. This is an important case study concerning high-confidence but low-accuracy protein structure prediction.
💡 Pipeline Tip
Normalise thermal B-factors when comparing different crystal structures.
🛠️ Resources
- Dataset: PDB-REDO - Optimized protein structure database with refined models.
- Dataset: CATH - Hierarchical protein domain classification for structure and function.
- Tool: MultiFOLD/IntFOLD - High-performance protein structure prediction and quality assessment server. View all tools →
- Tool: PyMOL - Gold standard for molecular visualization and publication-quality imaging. View all tools →
- Event: Structural Biology Events (Open)
- Event: Protein Design Hub (LinkedIn Group) (Ongoing)
- Job: Mercor hiring Bioinformatics Data-Science Specialist in Greater Montreal Metropolitan Area - LinkedIn at Bioinformatics Careers
- Job: European Bioinformatics Institute | EMBL-EBI hiring Research Management Office Lead in England, United Kingdom - LinkedIn at Bioinformatics Careers
Deep learning is not a magic wand, but a powerful lens for structural biology. — Recep Adiyaman