Recep Adiyaman
weekly

Weekly Digest: Jan 26 - Jan 30, 2026

January 30, 2026 Daily Intelligence
Protein Design Daily

Building something in Protein Design?

I love collaborating on new architectural challenges. Let's build together.

🧬 Protein Design Digest

Curated protein signals by Recep Adiyaman

Join 1,000+ researchers. Unsubscribe anytime.

🧬 Weekly Recap

Jan 26 - Jan 30, 2026

Missed a day? Here are the top research signals and tools from Monday to Friday, summarized in one place.


🏆 Top Signals of the Week

🗓️ Monday, Jan 26

Energy-Driven Innovations in Computational De Novo Protein Engineering.

🧬 Abstract

Energy models play a crucial role in the advancement of computational de novo protein engineering, enabling the design of novel proteins with tailored functionalities. Proteins serve as the foundation of biochemical processes, making their precise engineering essential for applications in biotechnology, medicine, and synthetic biology. Unlike traditional approaches that focus on modifying existing proteins, de novo engineering introduces entirely new constructs, a paradigm shift driven by energy-based strategies that guide protein folding, stability, and functionality through comprehensive simulations of energy landscapes. Computational techniques such as molecular dynamics (MD), thermodynamic integration, and Monte Carlo sampling are fundamental in evaluating designed proteins’ stability and dynamic behavior. Widely used tools such as CHARMM, Amber, and Rosetta leverage advanced energy functions to optimize protein structures, facilitating accurate predictions of folding pathways and binding affinities. Additionally, the integration of machine learning (ML) and deep learning (DL) has significantly improved the speed and precision of energy-based modeling, enhancing the design and optimization process. This review systematically analyzes recent studies, provides quantitative benchmarking of major computational platforms, and presents a decision framework for method selection based on accuracy-cost-throughput trade-offs. By integrating classical force fields, quantum mechanical approaches, and AI-driven predictions with experimental validation, this work outlines a roadmap for advancing therapeutic and industrial protein design through synergistic physics-based and data-driven strategies.

Why it matters: Critical for improving fold accuracy and reducing structural uncertainty in de novo design.

🗓️ Tuesday, Jan 27

ModelCIF update: Supporting Emerging Classes of Computational Macromolecular Models.

🧬 Abstract

The recent development of highly accurate protein structure prediction tools has led to a rapid expansion in the scope of computational structural biology, enabling a much wider range of modelling studies than ever before. These new in silico opportunities help life science researchers understand how proteins interact with their environment and support design of new molecules with desired properties. Ultimately, they have broad applications, e.g. in medicine, drug discovery or engineering. To ensure reproducibility and to facilitate data exchange and reuse, predicted structures or computed structure models can be stored using ModelCIF, a rich data representation designed to include the atomic coordinates/metadata. The previously published version of ModelCIF (1.4.4; 2022-12-21) mainly covered protein structure predictions generated by homology and ab initio modelling. In this work, we present an extension of the ModelCIF (https://github.com/ihmwg/ModelCIF) data standard and its associated tools. This extension supports important new use cases, including modelling protein-ligand and protein-protein interactions, sampling multiple conformational states and designing proteins de novo. We define guidelines for storage and validation of modelling results for those use cases by applying new and existing ModelCIF categories to capture protocols, inputs and outputs. Additionally, we outline updates to the software tools and resources that implement these new standards and provide functionality for model generation, validation, archiving, and visualisation. By enabling consistent metadata capture across different modelling workflows, this framework aims to support the FAIR dissemination of computational models, thereby promoting reproducibility and reusability in downstream applications.

Why it matters: Critical for improving fold accuracy and reducing structural uncertainty in de novo design.

🗓️ Wednesday, Jan 28

Tailored pyrrole-based imidazothiazole scaffolds: Synthetic elaboration, enzyme kinetic profiling and DFT-guided molecular docking toward Antidiabetic therapeutics.

🧬 Abstract

The current research study highlights the successful biological evaluation of novel imidazo-thiadiazole based pyrrole derivatives, with the aim of targeting diabetes mellitus through alpha-amylase and alpha-glucosidase inhibition. These compounds exhibited promising anti-diabetic activity, notably compound 8 emerged as a leading candidate (3.50 ± 0.20, and 4.10 ± 0.10 µM) which outperformed the potential of acarbose (6.20 ± 0.10 and 6.70 ± 0.20 µM), a reference drug. The enhanced biological potential of compound 8 is likely due to incorporation of hydroxyl substituents, which may strengthen its binding affinity and selectivity towards the targeted enzymes. Molecular docking revealed stable interactions with key amino acids residues of targeted enzymes, providing mechanistic basis for its potent inhibitory activity. To further established their therapeutic relevance, enzyme kinetic study was conducted which confirmed their mode of inhibition while ADMET analysis indicated favorable pharmacokinetics and safety profiles. Moreover, pharmacophore modeling and molecular dynamics simulations reinforced the stability and binding efficiency of lead compounds under dynamic biological conditions. All the experimental results and in silico validations demonstrate that potent compounds possess significant anti-diabetic activity profile. Their ability to outperform an existing diabetes mellitus inhibitor and maintaining a favorable safety profile suggest that these compounds have potential to be further used in drug development and optimization against Diabetes Mellitus.

Why it matters: Provides actionable mutations to enhance catalytic efficiency or thermostability.

🗓️ Thursday, Jan 29

PepScorer::RMSD: An Improved Machine Learning Scoring Function for Protein-Peptide Docking.

🧬 Abstract

Over the past two decades, pharmaceutical peptides have emerged as a powerful alternative to traditional small molecules, offering high potency, specificity, and low toxicity. However, most computational drug discovery tools remain optimized for small molecules and need to be entirely adapted to peptide-based compounds. Molecular docking algorithms, commonly employed to rank drug candidates in early-stage drug discovery, often fail to accurately predict peptide binding poses due to their high conformational flexibility and scoring functions not being tailored to peptides. To address these limitations, we present PepScorer::RMSD, a novel machine learning-based scoring function specifically designed for pose selection and enhancement of docking power (DP) in virtual screening campaigns targeting peptide libraries. The model predicts the root-mean-squared deviation (RMSD) of a peptide pose relative to its native conformation using a curated dataset of protein-peptide complexes (3-10 amino acids). PepScorer::RMSD outperformed conventional, ML-based, and peptide-specific scoring functions, achieving a Pearson correlation of 0.70, a mean absolute error of 1.77 Å, and top-1 DP values of 92% on the evaluation set and 81% on an external test set. Our PLANTS-based workflow was benchmarked against AlphaFold-Multimer predictions, confirming its robustness for virtual screening. PepScorer::RMSD and the curated dataset are freely available in Zenodo.

Why it matters: Expands the searchable sequence space for novel folds and high-affinity binders.

🗓️ Friday, Jan 30

Scalable embedding fusion with protein language models: insights from benchmarking text-integrated representations.

🧬 Abstract

Protein language models (pLMs) have become essential tools in computational biology, powering diverse applications from variant effect prediction to protein engineering. Central to their success is the use of pretrained embeddings-contextualized representations of amino acid sequences-which enable effective transfer learning, especially in data-scarce settings. However, recent studies have revealed that standard masked language modeling objectives used to train these models often produce representations that are misaligned with the needs of downstream tasks. While scaling up model size improves performance in some cases, it does not universally yield better representations. In this study, we investigate two complementary strategies for improving pLM representations: (i) integrating text annotations through contrastive learning, and (ii) combining multiple embeddings via embedding fusion. We benchmark six text-integrated pLMs (tpLMs) and three large-scale pLMs across six biologically diverse tasks, showing that no single model dominates across settings. Fusion of multiple tpLMs embeddings improves performance on most tasks but presents a computational bottleneck due to the combinatorial number of possible combinations. To overcome this, we propose greedier forward selection, a linear-time algorithm that efficiently identifies near-optimal embedding subsets. We validate its utility through two case studies, homologous sequence recovery and protein-protein interaction prediction, demonstrating new state-of-the-art results in both. Our work highlights embedding fusion as a practical and scalable strategy for improving protein representations.

Why it matters: Provides actionable mutations to enhance catalytic efficiency or thermostability.


📚 All Papers & Quick Reads

🗓️ Monday, Jan 26

🗓️ Tuesday, Jan 27

🗓️ Wednesday, Jan 28

🗓️ Thursday, Jan 29

🗓️ Friday, Jan 30


🛠️ Tools & Datasets

  • 🛠 Tool: Rosetta - Protein modeling, docking, and design suite.
  • 🛠 Tool: AutoDock Vina - Molecular docking for ligand screening and scoring.
  • 💾 Dataset: UniRef - Clustered protein sequence sets for fast similarity searches.
  • 💾 Dataset: BFD - Big Fantastic Database for deep learning protein modeling.
  • 🛠 Tool: GROMACS - High-performance molecular dynamics engine.
  • 💾 Dataset: MGnify - Metagenomics resource for microbiome sequence data.
  • 🛠 Tool: OpenMM - GPU-accelerated molecular simulation toolkit.
  • 💾 Dataset: PDBbind - Binding affinity data with 3D structures of protein-ligand complexes.
  • 🛠 Tool: AlphaFill - Ligand and cofactor transfer into AlphaFold models.
  • 💾 Dataset: BioLiP - Verified biologically relevant ligand-protein interactions.
  • 🛠 Tool: ReFOLD4 - Sophisticated protein structure refinement tool for improving model quality.
  • 💾 Dataset: SIFTS - Residue-level mapping between PDB, UniProt, and other resources.

🤖 AI in Research Recap


🏢 Industry & Real-World Applications


💼 Jobs & Opportunities


📅 Events


Enjoyed this digest? Subscribe above to get these dailies in your inbox every morning.

BS HF DK