Recep Adiyaman
Daily Signal March 13, 2026 · 9 min read

Issue #67: How to make the most of your masked language model for protein engineering

Protein Design Digest - 2026-03-13 - Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals

Share X LinkedIn
Protein Design Daily

Building something in Protein Design?

I love collaborating on new challenges. Let's build together.

Subscribe to Protein Design Digest

Daily curated signals from arXiv, PubMed, and BioRxiv.

Signal of the Day

How to make the most of your masked language model for protein engineering

A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing a flexible, effective sampling method for masked language models (MLMs), and by systematically evaluating models and methods both in silico and in vitro on actual antibody therapeutics campaigns. Firstly, we propose sampling with stochastic beam search, exploiting the fact that MLMs are remarkably efficient at evaluating the pseudo-perplexity of the entire 1-edit neighborhood of a sequence. Reframing generation in terms of entire-sequence evaluation enables flexible guidance with multiple optimization objectives. Secondly, we report results from our extensive in vitro head-to-head evaluation for the antibody engineering setting. This reveals that choice of sampling method is at least as impactful as the model used, motivating future research into this under-explored area.

Why this matters:


Also Worth Reading

Binding interactions of Trametes villosa and Trametes lactinea laccases with 4-nonylphenol and its intermediates: molecular docking and molecular dynamics approaches.

Emerging pollutants such as 4-nonylphenol (4-NP) act as endocrine disruptors and have been associated with reproductive toxicity in humans and wildlife, as well as with physiological disturbances in aquatic, terrestrial, and plant organisms. Laccases are oxidoreductases with notable biotechnological relevance and the ability to oxidize phenolic pollutants, making them attractive candidates for biodegradation strategies. This study investigated the interactions between laccases from Trametes villosa and Trametes lactinea and 4-NP and its degradation intermediates via molecular docking and molecular dynamics simulations (MDS). Ligands were geometrically optimized using the PM7 semiempirical method, and their global reactivity descriptors were computed to explore correlations between electronic properties and laccase binding affinity. Docking revealed favorable binding energies (ΔG bind ≈ -6 kcal·mol -1 ) and recurrent interactions with key amino acid residues, including Ala, Glu, Leu, Phe, Pro, Ser, Val, and His, mainly through hydrogen bonding and hydrophobic contacts. The MDS confirmed the stability of the enzyme-ligand complexes, as indicated by low root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values, along with consistent radius of gyration and solvent-accessible surface areas throughout the trajectories. Binding free energy calculations using the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) method indicated stronger binding affinity under solvation, with ΔG bind values of -26.45 and -17.73 kcal·mol -1 for T. villosa and T. lactinea, respectively, highlighting hydrophobic and van der Waals contributions as the primary stabilizing forces. Overall, these results provide computational evidence that laccases from T. villosa and T. lactinea have potential for application in the oxidative biodegradation of 4-NP. These findings advance the molecular understanding of fungal laccase‒pollutant interactions and support future in vitro validation and protein engineering strategies aimed at enhancing biodegradation efficiency.

Artificial intelligence driven protein design and sustainable nanomedicine for advanced theranostics.

The integration of artificial intelligence, protein engineering, and sustainable nanomedicine is driving a paradigm shift in theranostics by enabling highly precise disease diagnosis and targeted therapy. AI-driven methodologies, including machine learning and deep learning, facilitate the rapid analysis of complex biological and chemical datasets, accelerating protein structure prediction, molecular docking, and structure-activity relationship modeling. These capabilities support the rational design of proteins and peptides with enhanced specificity, therapeutic efficacy, and safety, while enabling personalized treatment strategies tailored to individual molecular profiles. In parallel, sustainable nanomedicine focuses on the development of biodegradable, biocompatible, and environmentally benign nanomaterials to improve drug bioavailability, stability, and controlled release. AI-assisted optimization further refines nanocarrier design by balancing therapeutic performance with safety and environmental impact. Advanced intelligent nanocarriers capable of real-time monitoring, adaptive drug release, and degradation into non-toxic by-products represent a significant advancement over conventional static systems. The theranostic paradigm has become central to precision medicine, particularly in oncology, especially where AI-designed nanoplatforms enable targeted delivery of imaging agents and therapeutics to tumors, while allowing continuous treatment monitoring and minimizing off-target effects. Emerging applications in neurological, infectious, and cardiovascular diseases further highlight the broad clinical potential of this approach. Accordingly, this review summarizes AI-driven protein design strategies, sustainable nanocarrier engineering, and their convergence in next-generation theranostic systems, critically discussing mechanistic insights, translational challenges, and design principles required for developing safe, scalable, and clinically adaptable intelligent nanomedicines.

In silico prediction, molecular docking and simulation of natural flavonoid apigenin and xanthoangelol E against human metapneumovirus.

Human metapneumovirus (hMPV) is one of the potential pandemic pathogens, and it is a concern for elderly subjects and immunocompromised patients. There is no vaccine or specific antiviral available for hMPV. We conducted an in-silico study to predict initial antiviral candidates against human metapneumovirus. Our methodology included protein modeling, stability assessment, molecular docking, molecular simulation, analysis of non-covalent interactions, bioavailability, carcinogenicity, and pharmacokinetic profiling. We pinpointed four plant-derived bio-compounds as antiviral candidates. Among the compounds, apigenin showed the highest binding affinity, with values of - 8.0 kcal/mol for the hMPV-F protein and - 7.6 kcal/mol for the hMPV-N protein. Molecular dynamic simulations and further analyses confirmed that the protein-ligand docked complexes exhibited acceptable stability compared to two standard antiviral drugs. Additionally, these four compounds yielded satisfactory outcomes in bioavailability, drug-likeness, and ADME-Tox (absorption, distribution, metabolism, excretion, and toxicity) and STopTox analyses. This study highlights the potential of apigenin and xanthoangelol E as an initial antiviral candidate, underscoring the necessity for wet-lab evaluation, preclinical and clinical trials against human metapneumovirus infection. Supplementary information The online version contains supplementary material available at 10.1007/s40203-025-00539-7.


Research & AI Updates

From the Industry


Quick Reads

Establishing FDA-approved oncology drugs as GPR176 inhibitor through homology modelling, molecular docking, MMGBSA, DFT, and molecular dynamics simulation.

How to make the most of your masked language model for protein engineering

A plethora of protein language models have been released in recent years. Read more →

Needle-in-a-haystack approach: rapid screening of PDE1C inhibitors through the combination of machine learning, molecular docking, molecular dynamics simulations and experimental validation.

Structure-based computational screening and molecular dynamics reveal potential inhibitors of Norovirus VP1 and RdRp Proteins: an in-silico study

Abstract Norovirus is recognized as a pathogen with pandemic potential, exhibiting a higher fatality rate in low-income countries, particularly affecting young children. Read more →

Protein Counterfactuals via Diffusion-Guided Latent Optimization

Deep learning models can predict protein properties with unprecedented accuracy but rarely offer mechanistic insight or actionable guidance for engineering improved variants. Read more →

aGPCR-HEK: A Stable High-Expression Inducible Mammalian Cell Expression System for Adhesion GPCR Structural Biology Applications.

ADGRL4 is an adhesion G protein-coupled receptor (aGPCR) implicated in multiple tumours. Read more →

Sulfonic Acid Group Docking Synthesis of Platinum Clusters in MOFs Cavity Enables Low-Temperature Stable Selective CO2 Hydrogenation to Methanol.

Platinum (Pt) nanoparticles favor the hydrogenation of CO2 to CO, presenting a significant challenge for value-added methanol synthesis. Read more →

A Modified Paraspinal Approach for Full-Endoscopic Discectomy for Far Lateral Disc Herniations: Docking at the Caudal Level Transverse Process.

The use of least invasive full-endoscopic spine systems has decreased the amount of tissue dissection, blood loss, and duration of post-operative recovery after intervention for far-lateral disc herniations (FLDH). Read more →

Pipeline Tip

Check for missing residues in PDB files using PDB-Fixer before simulation.


Resources & Tools

The protein structure is the language of life; design is its poetry. — Recep Adiyaman

BS HF DK