Get In Touch

De Novo Protein Design: How AI Creates New Proteins

Updated - 30 Jan 2026 25 min read
xtatic logo green
Yoanna Stefanova Technical Copywriter at XTATIC HEALTH
3D visualization of de novo protein structures above a transparent digital hand.

The pharmaceutical and biotechnology industries face a critical bottleneck. Traditional drug discovery relies on finding molecules that already exist in nature. This approach limits innovation and often results in treatments that work around biological constraints rather than solving them directly.

Our analysis of AI applications in pharmaceutical manufacturing shows how artificial intelligence transforms entire production pipelines. In 2023, researchers achieved what seemed impossible just decades ago. They designed entirely new proteins from scratch using artificial intelligence. Those attempts were successful in 80% of instances, according to a 2025 review article

This article dives into how de novo protein design represents the next frontier in this digital transformation. It is valuable for everyone who wants to know how to use this new technology to improve their future in binding specific targets, catalyzing reactions, or forming novel biomaterials. 

Key takeaways:

  • De novo protein design = using AI to create entirely new proteins from scratch without natural templates
  • Success rates jumped from <10% to 80%+ with modern AI methods like RFdiffusion and ProteinMPNN
  • Applications span drug discovery, biomaterials, and enzyme engineering, with a wide range of commercial potential
  • AI models process massive protein data bank (PDB) datasets to learn folding patterns and functional sites
  • Designer proteins solve “undruggable” targets and create novel biomaterials impossible through traditional methods
  • Success depends on integrated pipelines combining multiple learning methods for optimal results

What exactly is de novo protein design?

De novo protein design creates entirely new proteins from scratch. No natural templates. No existing sequence databases. Just pure computational “creativity” guided by physical laws.

Think of it as molecular architecture. Architects create blueprints for buildings that have never been constructed. Specialists who design proteins create molecular blueprints for structures that have never evolved.

The designer proteins must meet strict requirements:

  • fold into specific three-dimensional shapes
  • perform predetermined functions
  • remain stable under operating conditions
  • avoid toxic or unwanted side effects

The evolution from traditional methods

Early protein engineering in the 1980s focused on modifying existing proteins. Scientists would take a known protein structure and make incremental changes. This approach worked but remained constrained by the starting material.

You could only modify what nature had already created.

The field shifted dramatically with computational methods in the 2000s. The Protein Data Bank (PDB) grew to contain thousands of solved protein structures. This provided the foundation for understanding protein folding patterns.

This wealth of structural data enabled researchers like David Baker at the Institute for Protein Design to pioneer early computational approaches that laid the groundwork for today’s AI-driven methods.

Core principles of de novo design

Infographic on the evolution of protein design: from the 1980s to today’s AI revolution.

Modern de novo protein design operates on fundamental principles:

  • Physical constraints: the protein must fold stably into its intended structure. This requires understanding how amino acid sequences translate into three-dimensional shapes.
  • Functional requirements: the protein must perform its intended task. Whether binding to disease targets, proteins, catalyzing reactions, or forming materials.
  • Optimization challenges: success rates depend on balancing multiple competing factors. Stability, solubility, and specific binding interactions at functional sites must all work together.

The process typically involves two major steps:

  1. Generate a protein backbone that provides the overall shape
  2. Optimize the amino acid sequence to stabilize that structure and enable the desired function

Key Takeaway
Proteins can now be designed, not discovered. Instead of tweaking what nature gives us, scientists write fresh “blueprints” for brand-new molecules.

How AI changed protein design

Artificial intelligence solved protein design’s most fundamental challenge. It tackles the astronomical number of possible protein sequences that made traditional methods impractical.

A typical protein contains 100-300 amino acids. With 20 possible amino acids at each position, the theoretical sequence space exceeds the number of atoms in the observable universe.

Traditional computational methods could explore only tiny fractions of this space. Rosetta software used physics-based energy functions to evaluate protein designs. But it required extensive computational resources and often produced mixed results.

Success rates for creating functional proteins remained frustratingly low. Typically below 10%.

The deep learning revolution

Machine learning methods changed this landscape dramatically starting around 2018. AlphaFold’s success in protein structure prediction demonstrated something crucial. Neural networks could capture complex relationships between protein sequence and structure that had eluded traditional approaches.

The breakthrough came from training AI models on vast datasets from the Protein Data Bank. These learning methods could identify patterns in natural proteins that human scientists had missed.

More importantly, they could generate novel combinations. These maintained the essential features of stable, functional proteins while incorporating entirely new sequences.

Modern AI architectures unlock new possibilities

Current AI approaches use sophisticated architectures:

Graph neural networks treat proteins as molecular graphs. Amino acids become nodes, and chemical bonds become edges. This representation captures the three-dimensional relationships critical for protein function.

Diffusion models borrow techniques from image generation. They learn to reverse a noising process to create new protein structures from random starting points.

Protein language models treat protein sequences like text. They learn the “grammar” of protein folding from millions of natural sequences. These models can generate new sequences that follow the same folding rules as natural proteins.

The combination of these approaches has pushed success rates above 80% in some applications. This dramatic improvement makes de novo design practically viable for drug discovery and biotechnology applications.

Related Article:  See how AI reshapes pharma manufacturing pipelines in our explainer on AI and ML’s impact on pharmaceutical manufacturing

 

 

Traditional (physics-based) design

AI-first design

Core approach

Sample a protein backbone, score with energy functions, then run sequence design.

Generate backbones with diffusion, assign sequences with ProteinMPNN, validate with AlphaFold/RoseTTAFold.

Speed & search

Narrow exploration; slower loops.

Broad exploration; fast loops with high throughput.

Control & transparency

Clear physics; strong control over binding interfaces and packing.

Strong priors; needs physics checks to catch clashes and unsatisfied atoms.

Expected outcomes

Stable scaffolds after more wet-lab cycles.

Higher success rates by orders of magnitude for protein binder tasks.

Best fit

Sparse data; safety-critical programs that need auditability.

General de novo protein design with strict functional site geometry and rapid iteration.

Key Takeaway
AI shrank an “impossible” search problem into a solvable one. Deep learning tools now spot stable shapes and useful functions with far higher accuracy.

Is AI-driven de novo protein design production-ready?

What “success rate” really means

“Success rate” in AI protein design is usually misunderstood. AI can predict a formula, but not its full potential. In other words, a high success rate in predicting a shape does not mean you have a high success rate in getting a working product.

Where lab failures occur

Failures often happen in the gap between “folding” and “functioning.” An AI model might generate a protein sequence that folds into a stable shape, just as predicted. However, that shape might sit in a test tube and do absolutely nothing. The failure is usually invisible to the computer. The model sees patterns from old data, but it doesn’t understand the physical forces required for a chemical reaction to happen. When these “black box” designs fail, scientists are left stuck.

What works reliably today vs what doesn’t

What works: Predicting the 3D shape of a protein is now very reliable. If you need to tweak an existing enzyme or design a simple binder, AI does this well. What doesn’t: Designing complex enzymes from scratch is not here yet. AI struggles to invent brand-new chemical reactions or proteins that need to survive extreme heat or pH without help from physics-based tools. It’s fast, but for complex chemistry, it’s still often just expensive guesswork.

Modern methods of de novo protein design

Contemporary protein design integrates multiple computational approaches. Each addresses different aspects of the design challenge. The field has evolved from single-method approaches to sophisticated pipelines that combine the strengths of various AI architectures.

Computational Foundation

Two-step process drives modern design

De novo design typically involves generating a protein backbone (shape) and then finding the optimal amino acid sequence that folds into that specific structure and performs the desired function. This separation allows designers to tackle structure and sequence optimization as distinct but related problems.

The protein backbone provides the overall architectural framework. The sequence optimization ensures that the framework remains stable and functional. This division enables more targeted approaches to each challenge.

Optimization problem framework

The design process is viewed as an optimization problem: finding the right sequence-structure combination for a given design objective. Success depends on balancing multiple competing factors:

  • thermodynamic stability of the final structure
  • specific binding interactions at functional sites
  • expression and folding efficiency in host systems
  • avoidance of aggregation or misfolding pathways

Modern algorithms can explore vast sequence spaces while maintaining these physical constraints.

AI and Machine Learning models

ProteinMPNN: sequence generation powerhouse

ProteinMPNN represents one of the most successful sequence design tools. It uses message-passing neural networks to generate amino acid sequences for predetermined protein backbones. The model achieves remarkable accuracy by treating protein design as a graph-based optimization problem.

How it works:

  • takes a protein backbone structure as input
  • examines the local environment around each position
  • considers factors like hydrophobic packing and hydrogen bonding
  • predicts which amino acid sequence will fold into that exact shape

Performance metrics:

  • sequence recovery rates above 50%
  • can predict sequences that fold into nearly identical structures
  • high confidence predictions for novel protein designs

In practical applications, ProteinMPNN has revolutionized sequence design. It can predict sequences that fold into nearly identical structures if it has the backbone of a natural protein. This capability translates directly to de novo design.

Diffusion models: structure generation breakthrough

Diffusion models like RFdiffusion and Chroma represent the most exciting recent development in protein structure generation. These models are trained on protein structures to learn to reverse a noising process. 

That process involves the gradual addition of random distortions that corrupt the original protein structure until it becomes unrecognizable. This training allows them to generate novel, diverse, and functionally relevant protein structures by starting with pure random noise and systematically removing the distortions.

RFdiffusion exemplifies this approach:

Training process:

  • learn to gradually add noise to known protein structures from the Protein Data Bank PDB
  • continue until they become random arrangements of atoms
  • train a neural network to reverse this process
  • generate new proteins by starting with noise and applying denoising

Key advantages:

  • generate diverse, novel structures
  • maintain physical constraints for proper folding
  • create protein topologies never observed in nature
  • enable functions that natural evolution never explored

This approach breaks free from nature’s limitations. Traditional methods modify existing proteins, but diffusion models design completely novel binding interfaces and functional sites. These artificial proteins can perform functions that natural evolution never discovered.

Protein Language Models: pattern recognition

Protein language models are trained on existing protein data from databases. These models can generate new sequences with functions similar to those in the training set. They treat protein sequences like text and learn the “grammar” of protein folding from millions of natural sequences.

Core capabilities:

  • identify sequence patterns that correlate with specific functions
  • generate novel sequences following natural folding rules
  • predict functional variants of existing proteins
  • enable rapid exploration of sequence space around known structures

These models excel at capturing the statistical regularities of natural protein sequences while enabling creative combinations that maintain biological plausibility.

Physical and energy-based approaches

Rosetta Software: physics-based optimization

Rosetta is a leading platform that uses physically-based energy functions to identify amino acid sequences. It targets sequences that lead to stable, low-energy structures and desired functions.  Despite AI successes, physics-based approaches like Rosetta remain crucial components of modern design pipelines.

Rosetta strengths:

  • Van der Waals interactions modeling
  • Hydrogen bonding calculations
  • Electrostatic effects prediction
  • Solvation energy optimization

Practical applications:

  • Optimize designs for stability and function simultaneously
  • Fine-tune binding sites for a particular target protein
  • Adjust protein stability for different environments
  • Incorporate non-natural amino acids for specialized applications

Modern workflows often combine AI generation with Rosetta optimization. An AI model generates an initial design. Rosetta then refines the design to optimize specific interactions and ensure thermodynamic stability.

Backbone sampling and sequence optimization

These methods explore possible protein backbone structures and then optimize the amino acid sequence to stabilize the chosen backbone. The approach typically involves:

Backbone exploration:

  • Sample conformational space systematically
  • Identify architectures capable of the desired functions
  • Filter based on geometric and chemical constraints

Sequence optimization:

  • Design amino acid sequences for selected backbones
  • Balance stability requirements with functional needs
  • Iterate between structure and sequence refinement

This systematic approach ensures that both structural and sequence elements contribute to overall design success.

Application-driven strategies

Motif scaffolding: function-first design

Motif scaffolding is a method where a functional motif is designed first. A surrounding protein scaffold is then built to stabilize and enhance the motif’s function.  This approach proves particularly powerful for creating proteins with predetermined binding sites or catalytic activities.

Design workflow:

  • Identify or design the essential functional motif
  • Create surrounding scaffold architecture
  • Optimize scaffold-motif interactions
  • Ensure overall structural stability

Applications:

  • Enzyme active site design
  • Specific binding site creation
  • Regulatory domain incorporation

Functional site design: target-specific engineering

Functional site design involves creating a protein to specifically interact with a target, such as binding to a virus or a cell receptor, by creating a tailored binding site within the new protein. This approach enables precise molecular recognition capabilities.

Key considerations:

  • Target molecule geometry and chemical properties
  • Binding interface complementarity
  • Specificity requirements to avoid off-target effects
  • Integration with overall protein stability

Success examples:

  • Viral inhibitor design with novel binding modes
  • Receptor agonists or antagonists with improved selectivity
  • Diagnostic proteins for specific biomarker detection

Comparison of modern approaches

Method

Primary Strength

Typical Application

Success Rate

ProteinMPNN

Sequence optimization for given backbones

Stabilizing designed structures

>50% sequence recovery

RFdiffusion

Novel backbone generation

Creating new protein folds

~80% for symmetric assemblies

Rosetta

Physical accuracy and functional optimization

Fine-tuning binding sites

Variable by application

Protein Language Models

Natural sequence patterns

Generating functional variants

~30-60% functional designs

Motif Scaffolding

Function-first design approach

Enzyme and binding site creation

High for well-defined motifs

Functional Site Design

Target-specific optimization

Therapeutic and diagnostic proteins

Depends on target complexity

Key Takeaway
Different methods solve different pieces of the puzzle—backbones, sequences, stability, or function. The most powerful designs come from combining them into a single pipeline.

Case study: RFdiffusion and generative AI for proteins

The experimental validation results for RFdiffusion demonstrate the practical impact of this breakthrough approach. David Baker’s team at the University of Washington has achieved remarkable success rates that translate directly to commercial applications.

The diffusion process explained

The core innovation lies in adapting diffusion models to three-dimensional protein structures. RFdiffusion creates these structures by reversing the gradual corruption of known protein geometries. This mirrors how image diffusion models generate pictures by reversing a noising process. 

Training workflow:

  1. RFdiffusion takes known protein structures from the protein data bank (PDB)
  2. The system progressively adds random noise to atomic coordinates
  3. The process continues until the original structure becomes indistinguishable from random noise
  4. The neural network trains itself to predict how to reverse each corruption step
  5. The model learns to “denoise” random atomic arrangements into valid protein structures

The trained model generates entirely new proteins by starting with pure noise. It then applies the learned denoising process step by step.

This approach enables the creation of protein architectures that may never have existed in nature. Yet it maintains the physical constraints necessary for proper folding.

Technical implementation details

RFdiffusion processes protein structures as graphs. Amino acid residues represent nodes, and spatial relationships define edges. The model uses transformer-like architectures to update these representations iteratively.

Key features:

  • еach iteration refines the protein structure
  • gradually transforms random coordinates into chemically plausible geometries
  • can incorporate various constraints during generation
  • guides generation while allowing creative freedom

Constraint capabilities:

  • specify binding sites that must interact with a particular target protein
  • enforce specific symmetries for protein complexes
  • constrain certain regions to adopt particular secondary structures
  • design functional site requirements into the generation process

Experimental validation results

Diagram of the process for designing new proteins using RFdiffusion and artificial intelligence.

The practical success of RFdiffusion has been demonstrated through extensive laboratory testing. In one landmark study, researchers designed symmetric protein assemblies using RFdiffusion. They then synthesized and tested these designs in the laboratory.

Performance metrics:

  • success rate exceeded 80% for symmetric assemblies
  • designed proteins folded into structures matching computational predictions
  • experimental error margins within acceptable ranges
  • functional validation confirmed binding activities

Perhaps more impressively, RFdiffusion has enabled the design of protein binder molecules for specific targets. The system can generate proteins with binding sites complementary to particular ligands or other proteins.

Experimental validation shows that these designed binders often achieve affinities comparable to natural antibodies. But with entirely artificial sequences and structures.

Key Takeaway
Diffusion models can “grow” proteins from random noise, much like AI image generation tools. Lab tests show they reliably produce stable, functional designs.

What problems does de novo protein design solve better than traditional approaches?

Problem area

Traditional approach (The old way)

De novo design (The AI way)

Undruggable targets

Needs a pocket. Small-molecule drugs rely on finding a deep groove or pocket to sit in. If the target is flat or slippery, the drug slides off.

Creates a custom grip. AI builds proteins with custom shapes that can wrap around flat surfaces or clamp onto targets that lack deep pockets.

Protein-protein interactions

Too big and clumsy. We often use antibodies to block interactions. They are effective but massive and struggle to get inside cells or tissues.

Tiny and stable. AI designs “mini-binders” – tiny, ultra-stable wedges that fit into tight spaces where large antibodies simply cannot fit.

Novel enzyme reactions

Tweaking nature. Directed evolution takes an existing enzyme and improves it. 

Inventing new chemistry. AI designs brand-new active sites from scratch. 

Biomaterials

Inconsistent. We use natural materials like silk or collagen. They are useful but vary in quality and are hard to modify.

Tunable and precise. AI treats proteins like LEGO bricks. We can precisely tune stiffness, pore size, and heat resistance.

Applications of de novo protein design

The practical applications span multiple industries. Each sector leverages designer proteins to solve previously intractable problems. The ability to create proteins with predetermined functions opens new possibilities across therapeutics, materials science, and biotechnology.

Drug discovery breakthroughs

Pharmaceutical applications represent the most immediate and high-impact area. Traditional drug discovery faces significant limitations when targeting “undruggable” proteins. These binding sites are too shallow, too hydrophobic, or too dynamic for conventional small-molecule drugs.

Designer proteins solve this problem by creating large, stable binding interfaces. These can engage challenging targets that small molecules cannot reach.

Real-world example: 4-1BB receptor targeting

The 4-1BB receptor exemplifies this approach. This immune checkpoint protein plays crucial roles in T-cell activation but proved difficult to target with conventional drugs. Recent research demonstrated the successful design of nanobody binders specifically targeting 4-1BB epitopes (Poddiakov et al., 2025).

Key achievements:

  • computational predictions achieved binding scores comparable to natural antibodies
  • novel binding interfaces created through de novo design
  • entirely artificial sequences with no natural templates
  • potential for improved therapeutic profiles

Antimicrobial peptide development

The antimicrobial peptide field showcases another major pharmaceutical application. AMPGen, an evolutionary information-reserved diffusion model, has generated antimicrobial peptides with impressive results:

  • 81.58% positive rates in experimental validation
  • broad-spectrum activity against both Gram-positive and Gram-negative bacteria
  • novel approaches to combat antibiotic resistance
  • sequences absent from existing AMP databases (Jin et al., 2025)

Protein therapeutics advantages

Protein therapeutics benefit significantly from de novo design approaches:

  • create entirely new molecules optimized for specific therapeutic goals
  • improved pharmacokinetics compared to natural alternatives
  • reduced immunogenicity through careful sequence design
  • enhanced tissue specificity for targeted delivery

Related Article: What Is Personalized Medicine: Principles and Software

De novo protein design enables personalized therapeutics tailored to individual patient genetic profiles.

Biomaterials innovation

Material science applications leverage the precise control over protein structure that de novo design provides. Natural structural proteins like collagen or spider silk have valuable properties. Their complex production requirements, however, limit practical applications.

Designer proteins can replicate these properties while enabling production in simple bacterial systems.

Self-assembling protein materials

Self-assembling protein materials represent a particularly exciting application:

  • proteins with specific interaction interfaces
  • materials that spontaneously organize into desired structures
  • nanofibers, hydrogels, or rigid frameworks
  • responsive elements that change properties with environmental conditions

Tunable material properties

The precision of protein design enables the creation of materials with properties tuned for specific applications. Those including controlled stiffness for different mechanical requirements and biodegradability timelines for medical applications. These designed proteins can also feature specific biocompatibility profiles for tissue engineering and tailored surface properties for specific cellular interactions.

Tissue engineering applications

The materials find applications in tissue engineering, where scaffolds provide appropriate mechanical support and allow cellular integration and growth. They enable eventual replacement by natural tissue through controlled degradation as healing progresses.

Enzyme engineering advances

Catalytic applications represent the most challenging aspect of de novo protein design. These systems must not only bind substrates but also facilitate specific chemical transformations. Successful enzyme designs depend on positioning amino acid residues with atomic precision to stabilize transition states and enable bond formation or cleavage.

Design challenges:

  • stabilize transition states for chemical reactions
  • facilitate bond breaking and forming
  • position catalytic residues with atomic precision
  • enable substrate binding and product release

Non-natural reaction capabilities

Success stories include the design of enzymes for non-natural reactions:

  • chemical transformations that don’t occur in biological systems
  • industrial importance for manufacturing processes
  • reactions impossible with traditional chemical catalysts
  • green chemistry applications with environmental benefits

Industrial applications

Industrial applications drive much interest in de novo enzyme design:

  • operate under conditions that denature natural enzymes
  • high temperatures for industrial processes
  • extreme pH levels for specialized applications
  • presence of organic solvents for chemical synthesis
  • enable enzymatic processes where traditional biocatalysis isn’t feasible

Synthetic biology integration

Synthetic biology applications view proteins as components in engineered biological systems. Rather than designing individual proteins, researchers create networks of interacting proteins that implement complex cellular functions.

Protein-based logic circuits

Protein-based logic circuits exemplify this approach:

  • specific binding interactions create biological switches
  • amplifiers and memory devices built from protein components
  • control gene expression through programmable interactions
  • direct cellular behavior via engineered protein networks
  • respond to environmental signals in predetermined ways

Biosensor applications

Biosensor applications combine binding specificity with signal transduction:

  • Designer proteins detect specific molecules
  • Convert presence into detectable outputs
  • Fluorescence, enzymatic activity, or cellular behavior changes
  • Monitor environmental conditions or disease biomarkers
  • Process variables in biotechnology applications

Multi-functional system construction

The modular nature of protein domains enables sophisticated systems:

  • combine binding domains, catalytic domains, and regulatory elements
  • novel arrangements create proteins with complex behaviors
  • programmable behaviors impossible with natural biology
  • foundation for engineered biological systems

Related Article: Clinical Trial Phases: The Full Guide

Understanding clinical development pathways becomes crucial as de novo designed therapeutics enter human testing.

Abstract visualization of protein chains and molecular structures.

Challenges and limitations

Despite remarkable progress, de novo protein design faces persistent challenges. These limitations span technical, experimental, and practical domains. Continued research and development are required to overcome them.

Computational challenges

The protein folding problem remains incompletely solved

AI models like AlphaFold can predict structures for many natural proteins with high accuracy. They still often struggle with de novo designs. Novel sequence patterns or unusual structural features present difficulties.

Current design methods excel at creating proteins that resemble natural structures. They face difficulties when venturing into genuinely unexplored sequence space.

Model limitations:

  • learn patterns from natural proteins
  • may not encompass all possible functional architectures
  • constrains design creativity
  • may miss opportunities for capabilities beyond nature

Computational cost barriers

High-quality protein design requires extensive sampling:

  • thousands of design iterations
  • complex optimization procedures
  • substantial computational resources required
  • limited accessibility for many research groups
  • industrial applications face resource constraints

Experimental validation bottlenecks

The gap between computational prediction and experimental reality continues to challenge the field. Success rates have improved dramatically but remain well below 100%. Even the best computational designs sometimes fail in the laboratory.

Protein expression challenges:

  • many designs cannot be produced in bacterial expression systems
  • aggregation, misfolding, or toxicity issues
  • expression system choice dramatically affects success rates
  • optimizing expression conditions requires extensive trial-and-error

Functional validation complexities:

  • binding assays may not capture all performance aspects
  • enzymatic activity measurements have limitations
  • stability tests may miss important factors
  • laboratory conditions may not reflect real applications
  • complex environments where proteins must function

Design scope limitations

Current AI-based protein design methods work best with single-domain proteins that have well-defined structures. They struggle with multi-domain proteins, membrane proteins, and dynamic systems that need conformational flexibility. While binding site design has seen notable successes, catalytic site design remains difficult. This is due to the precise geometric requirements and the inherently dynamic nature of enzymatic reactions. 

Designed proteins must function within complex cellular networks, not in isolation. They face unexpected interactions with cellular components and may be affected by post-translational modifications. The cellular environment can interfere with normal processes and alter protein behavior unpredictably.

Regulatory and safety considerations

The pharmaceutical industry faces regulatory hurdles because agencies have extensive experience with modified natural proteins but limited precedent for entirely artificial therapeutics. This lack of evolutionary precedent may require more extensive safety testing and longer approval pathways.

Complex intellectual property landscapes around AI-designed proteins complicate commercial viability. The relationships between design methods, resulting sequences, and functional properties create intricate patent situations. Legal frameworks for AI-designed proteins continue to evolve, potentially influencing both design strategies and commercial development.

Key Takeaway
Computers can design faster than biology can prove. Costs, lab bottlenecks, and regulatory uncertainty still limit how quickly designs reach real-world use.

Future of de novo protein design

Graphic representation of a neural network for protein engineering applications.

The trajectory points toward increasingly sophisticated capabilities and broader practical applications. Several emerging trends will shape the field’s development over the next decade.

Integration of multi-modal AI

Future design platforms will integrate multiple types of AI models. Rather than using separate tools for structure generation, sequence optimization, and functional prediction, unified models will optimize all factors together.

Advantages of integration:

  • More sophisticated design objectives
  • Balance multiple constraints simultaneously
  • Stability, expression, immunogenicity, pharmacokinetics optimization
  • Functional performance optimization in a single workflow
  • Multi-objective optimization reflects practical requirements

Experimental data integration:

  • Current models rely primarily on structural databases
  • Future approaches will incorporate functional data
  • Binding measurements and stability information included
  • Experimental integration bridges the gap between prediction and reality
  • Improved design accuracy through comprehensive training data

Expansion to FF

The field will move beyond single-domain proteins toward complex multi-protein systems. This expansion involves advances in multi-protein interface design and dynamic protein design.

Multi-protein interface design will enable the creation of protein machines and molecular systems with moving parts. These systems will include:

  • biosensors with multiple input signals
  • catalytic systems with regulatory controls
  • Coordinated protein complexes that perform complex functions

Dynamic protein design will incorporate conformational flexibility as a design parameter. This approach creates proteins that switch between different structures in response to binding partners or environmental conditions. Such responsive systems enable sophisticated regulatory mechanisms and therapeutic systems with programmable activity.

Complex system applications can include protein complexes with coordinated functions and membrane proteins designed for lipid environments. The field may also develop dynamic conformational ensembles and multi-state systems with switching capabilities.

Industrial-scale implementation

The maturation of design methods will drive adoption in industrial settings:

Pharmaceutical integration:

  • companies integrate de novo design into drug discovery pipelines
  • designed proteins as research tools and therapeutic candidates
  • novel drug targets accessible through designer proteins
  • personalized therapeutics through patient-specific designs

Biotechnology applications:

  • designed enzymes for industrial processes expand
  • proteins for biomaterial production scale-up
  • components for synthetic biology systems commercialize
  • success rates improve and costs decrease

Automated design platforms:

  • cloud-based services democratize access
  • researchers without computational resources can participate
  • innovation accelerates across biotechnology sector
  • reduced barriers to entry for protein design

Regulatory framework evolution

Regulatory agencies will likely develop specific guidelines for evaluating de novo designed proteins. These guidelines will need to balance innovation encouragement with safety assurance.

For therapeutic applications, regulatory frameworks will establish standards for computational validation and define experimental characterization requirements. Key elements will include:

  • standards for computational validation
  • experimental characterization requirements
  • optimized clinical evaluation pathways

Early approved designed protein therapeutics could establish important precedents that influence future regulatory approaches. Successful examples could demonstrate safety and efficacy potential, facilitating broader acceptance of the technology. 

International harmonization will create consistent standards across major markets and reduce development costs through unified requirements. This coordination will accelerate time-to-market for innovative treatments and promote global collaboration on regulatory frameworks.

Scientific and technological convergence

De novo protein designs are bound to intersect with other emerging technologies:

Synthetic biology integration:

  • Design of complete biological systems
  • Networks of interacting proteins
  • Cellular functions programmed through protein design
  • Biological circuits with predictable behaviors

Nanotechnology combination:

  • Hybrid systems merge biological and synthetic components
  • Enhanced functionality through material integration
  • Novel applications impossible with either technology alone
  • Precision assembly of complex functional systems

Advanced computational methods:

  • machine learning advances beyond protein design contribute
  • improved optimization algorithms
  • better uncertainty quantification
  • more efficient computational methods
  • enhanced design accuracy and reduced costs

These developments collectively point toward a future where de novo protein design becomes routine. The combination of improved computational methods, expanded experimental capabilities, and supportive regulatory frameworks will enable applications that seem ambitious today.

Related Article: Legacy System Migration: The Healthcare Perspective

As protein design tools mature, healthcare organizations will need strategies for integrating them with existing research infrastructure.

BGO Software helps teams turn de novo protein design into production software with secure platforms that integrate generative models and lab data. If you plan a binder or enzyme program, our engineers can map the workflow and deliver an audited stack that fits your LIMS and QA rules. Contact us to learn more!

Frequently Asked Questions (FAQ)

What is a de novo protein design?

De novo protein design is the computational creation of entirely new proteins from scratch, without using existing natural proteins as templates. These designer proteins are built using first principles of protein folding and function to achieve specific, predetermined capabilities.

What are de novo sequencing methods for proteins?

De novo sequencing methods determine protein sequences without prior knowledge of the protein’s identity, typically using mass spectrometry fragmentation patterns. In protein design contexts, these methods help validate that designed proteins fold into their intended structures and maintain their predicted sequences.

What is de novo synthesis of proteins?

De novo synthesis refers to the laboratory production of designed proteins using artificial gene synthesis and expression systems. The process involves converting computational protein designs into physical molecules through DNA synthesis, cloning, and protein expression in bacterial or other cellular hosts.

What is the de novo method?

The de novo method represents a computational approach that creates molecular designs from first principles rather than modifying existing structures. In protein design, this method uses physics-based energy functions and AI models to generate novel protein sequences and structures that have never existed in nature.

How successful are current de novo protein design methods?

Modern AI-driven methods achieve success rates above 80% for certain applications like symmetric protein assemblies, representing a dramatic improvement over traditional methods that typically achieved less than 10% success rates. However, success varies significantly depending on the complexity of the design target and functional requirements.

Resources

  • Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., … & Baker, D. (2022). Robust deep learning-based protein sequence design using ProteinMPNN. Science, 378(6615), 49-56.
  • Jin, S., Zeng, Z., Xiong, X., Huang, B., Tang, L., Wang, H., … & Lin, F. (2025). AMPGen: an evolutionary information-reserved and diffusion-driven generative model for de novo design of antimicrobial peptides. Communications Biology, 8(1), 839.
  • Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
  • Poddiakov, I., Umerenkov, D., Shulcheva, I., Golovina, V., Borisova, V., Pozdnyakova-Filatova, I., … & Blinov, P. (2025). An iterative strategy to design 4-1BB agonist nanobodies de novo with generative AI models. Scientific Reports, 15(1), 25412.
  • Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., … & Baker, D. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089-1100.
  • Yao, J., & Wang, X. (2025). Artificial intelligence in de novo protein design. Medicine in Novel Technology and Devices, 26, 100366.
xtatic logo green

Yoanna Stefanova

Yoanna is a Technical Copywriter with a keen interest in healthcare innovations and medicine. She is dedicated to crafting clear and engaging content that highlights the latest advancements and trends in the medical field.

link to the author’s linkedin profile

What’s your goal today?

Hire us to develop your
product or solution

Since 2008, BGO Software has been providing dedicated IT teams to Fortune
100 Pharmaceutical Corporations, Government and Healthcare Organisations, and educational institutions.

If you’re looking to flexibly increase capacity without hiring, check out:

On-Demand IT Talent Product Development as a Service

Get ahead of the curve
with tech leadership

We help startups, scale-ups & SMEs create cutting-edge healthcare products and solutions by providing them with the technical consultancy and support they need to break through.

If you’re looking to scope and validate your Health solution, check out:

Project CTO as a Service

See our Case Studies

Wonder what it takes to solve some of the toughest problems in Health (and how to come up with high-standard, innovative solutions)?

Have a look at our latest work in digital health:

Browse our case studies

Contact Us

We help healthcare companies worldwide get the value, speed, and scalability they need-without compromising on quality. You’ll be amazed of how within-reach top service finally is.

Have a project in mind?

Contact us
chat user icon

Hello!

Did you know that BGO Software is one of the only companies strictly specialising in digital health IT talent and tech leadership?

Our team has over 15 years of experience helping health startups, Fortune 100 enterprises, and governments deliver leading healthcare tech solutions.

If you want to explore your options, would you like to book a free consultation call today?

Yes

It’s a free, no-obligation, fact-finding opportunity. You’ll have a friendly chat with our team, ask any questions, and see how we could help in detail.