De Novo Protein Design: How AI Creates New Proteins

Updated - 30 Jan 2026 25 min read

Yoanna Stefanova Technical Copywriter at XTATIC HEALTH

3D visualization of de novo protein structures above a transparent digital hand.

The pharmaceutical and biotechnology industries face a critical bottleneck. Traditional drug discovery relies on finding molecules that already exist in nature. This approach limits innovation and often results in treatments that work around biological constraints rather than solving them directly.

Our analysis of AI applications in pharmaceutical manufacturing shows how artificial intelligence transforms entire production pipelines. In 2023, researchers achieved what seemed impossible just decades ago. They designed entirely new proteins from scratch using artificial intelligence. Those attempts were successful in 80% of instances, according to a 2025 review article.

This article dives into how de novo protein design represents the next frontier in this digital transformation. It is valuable for everyone who wants to know how to use this new technology to improve their future in binding specific targets, catalyzing reactions, or forming novel biomaterials.

Key takeaways:

De novo protein design = using AI to create entirely new proteins from scratch without natural templates
Success rates jumped from <10% to 80%+ with modern AI methods like RFdiffusion and ProteinMPNN
Applications span drug discovery, biomaterials, and enzyme engineering, with a wide range of commercial potential
AI models process massive protein data bank (PDB) datasets to learn folding patterns and functional sites
Designer proteins solve “undruggable” targets and create novel biomaterials impossible through traditional methods
Success depends on integrated pipelines combining multiple learning methods for optimal results

What exactly is de novo protein design?

De novo protein design creates entirely new proteins from scratch. No natural templates. No existing sequence databases. Just pure computational “creativity” guided by physical laws.

Think of it as molecular architecture. Architects create blueprints for buildings that have never been constructed. Specialists who design proteins create molecular blueprints for structures that have never evolved.

The designer proteins must meet strict requirements:

fold into specific three-dimensional shapes
perform predetermined functions
remain stable under operating conditions
avoid toxic or unwanted side effects

The evolution from traditional methods

Early protein engineering in the 1980s focused on modifying existing proteins. Scientists would take a known protein structure and make incremental changes. This approach worked but remained constrained by the starting material.

You could only modify what nature had already created.

The field shifted dramatically with computational methods in the 2000s. The Protein Data Bank (PDB) grew to contain thousands of solved protein structures. This provided the foundation for understanding protein folding patterns.

This wealth of structural data enabled researchers like David Baker at the Institute for Protein Design to pioneer early computational approaches that laid the groundwork for today’s AI-driven methods.

Core principles of de novo design

Infographic on the evolution of protein design: from the 1980s to today’s AI revolution.

Modern de novo protein design operates on fundamental principles:

Physical constraints: the protein must fold stably into its intended structure. This requires understanding how amino acid sequences translate into three-dimensional shapes.
Functional requirements: the protein must perform its intended task. Whether binding to disease targets, proteins, catalyzing reactions, or forming materials.
Optimization challenges: success rates depend on balancing multiple competing factors. Stability, solubility, and specific binding interactions at functional sites must all work together.

The process typically involves two major steps:

Generate a protein backbone that provides the overall shape
Optimize the amino acid sequence to stabilize that structure and enable the desired function

Key Takeaway
Proteins can now be designed, not discovered. Instead of tweaking what nature gives us, scientists write fresh “blueprints” for brand-new molecules.

How AI changed protein design

Artificial intelligence solved protein design’s most fundamental challenge. It tackles the astronomical number of possible protein sequences that made traditional methods impractical.

A typical protein contains 100-300 amino acids. With 20 possible amino acids at each position, the theoretical sequence space exceeds the number of atoms in the observable universe.

Traditional computational methods could explore only tiny fractions of this space. Rosetta software used physics-based energy functions to evaluate protein designs. But it required extensive computational resources and often produced mixed results.

Success rates for creating functional proteins remained frustratingly low. Typically below 10%.

The deep learning revolution

Machine learning methods changed this landscape dramatically starting around 2018. AlphaFold’s success in protein structure prediction demonstrated something crucial. Neural networks could capture complex relationships between protein sequence and structure that had eluded traditional approaches.

The breakthrough came from training AI models on vast datasets from the Protein Data Bank. These learning methods could identify patterns in natural proteins that human scientists had missed.

More importantly, they could generate novel combinations. These maintained the essential features of stable, functional proteins while incorporating entirely new sequences.

Modern AI architectures unlock new possibilities

Current AI approaches use sophisticated architectures:

Graph neural networks treat proteins as molecular graphs. Amino acids become nodes, and chemical bonds become edges. This representation captures the three-dimensional relationships critical for protein function.

Diffusion models borrow techniques from image generation. They learn to reverse a noising process to create new protein structures from random starting points.

Protein language models treat protein sequences like text. They learn the “grammar” of protein folding from millions of natural sequences. These models can generate new sequences that follow the same folding rules as natural proteins.

The combination of these approaches has pushed success rates above 80% in some applications. This dramatic improvement makes de novo design practically viable for drug discovery and biotechnology applications.

Related Article: See how AI reshapes pharma manufacturing pipelines in our explainer on AI and ML’s impact on pharmaceutical manufacturing

	*Traditional (physics-based) design*	*AI-first design*
*Core approach*	Sample a *protein backbone*, score with energy functions, then run *sequence design*.	Generate backbones with diffusion, assign sequences with ProteinMPNN, validate with AlphaFold/RoseTTAFold.
*Speed & search*	Narrow exploration; slower loops.	Broad exploration; fast loops with high throughput.
*Control & transparency*	Clear physics; strong control over *binding interfaces* and packing.	Strong priors; needs physics checks to catch clashes and unsatisfied atoms.
*Expected outcomes*	Stable scaffolds after more wet-lab cycles.	Higher *success rates* by *orders of magnitude* for *protein binder* tasks.
*Best fit*	Sparse data; safety-critical programs that need auditability.	General *de novo protein design* with strict *functional site* geometry and rapid iteration.

Key Takeaway
AI shrank an “impossible” search problem into a solvable one. Deep learning tools now spot stable shapes and useful functions with far higher accuracy.

Is AI-driven de novo protein design production-ready?

What “success rate” really means

“Success rate” in AI protein design is usually misunderstood. AI can predict a formula, but not its full potential. In other words, a high success rate in predicting a shape does not mean you have a high success rate in getting a working product.

Where lab failures occur

Failures often happen in the gap between “folding” and “functioning.” An AI model might generate a protein sequence that folds into a stable shape, just as predicted. However, that shape might sit in a test tube and do absolutely nothing. The failure is usually invisible to the computer. The model sees patterns from old data, but it doesn’t understand the physical forces required for a chemical reaction to happen. When these “black box” designs fail, scientists are left stuck.

What works reliably today vs what doesn’t

What works: Predicting the 3D shape of a protein is now very reliable. If you need to tweak an existing enzyme or design a simple binder, AI does this well. What doesn’t: Designing complex enzymes from scratch is not here yet. AI struggles to invent brand-new chemical reactions or proteins that need to survive extreme heat or pH without help from physics-based tools. It’s fast, but for complex chemistry, it’s still often just expensive guesswork.

Modern methods of de novo protein design

Contemporary protein design integrates multiple computational approaches. Each addresses different aspects of the design challenge. The field has evolved from single-method approaches to sophisticated pipelines that combine the strengths of various AI architectures.

Computational Foundation

Two-step process drives modern design

De novo design typically involves generating a protein backbone (shape) and then finding the optimal amino acid sequence that folds into that specific structure and performs the desired function. This separation allows designers to tackle structure and sequence optimization as distinct but related problems.

The protein backbone provides the overall architectural framework. The sequence optimization ensures that the framework remains stable and functional. This division enables more targeted approaches to each challenge.

Optimization problem framework

The design process is viewed as an optimization problem: finding the right sequence-structure combination for a given design objective. Success depends on balancing multiple competing factors:

thermodynamic stability of the final structure
specific binding interactions at functional sites
expression and folding efficiency in host systems
avoidance of aggregation or misfolding pathways

Modern algorithms can explore vast sequence spaces while maintaining these physical constraints.

AI and Machine Learning models

ProteinMPNN: sequence generation powerhouse

ProteinMPNN represents one of the most successful sequence design tools. It uses message-passing neural networks to generate amino acid sequences for predetermined protein backbones. The model achieves remarkable accuracy by treating protein design as a graph-based optimization problem.

How it works:

takes a protein backbone structure as input
examines the local environment around each position
considers factors like hydrophobic packing and hydrogen bonding
predicts which amino acid sequence will fold into that exact shape

Performance metrics:

sequence recovery rates above 50%
can predict sequences that fold into nearly identical structures
high confidence predictions for novel protein designs

In practical applications, ProteinMPNN has revolutionized sequence design. It can predict sequences that fold into nearly identical structures if it has the backbone of a natural protein. This capability translates directly to de novo design.

Diffusion models: structure generation breakthrough

Diffusion models like RFdiffusion and Chroma represent the most exciting recent development in protein structure generation. These models are trained on protein structures to learn to reverse a noising process.

That process involves the gradual addition of random distortions that corrupt the original protein structure until it becomes unrecognizable. This training allows them to generate novel, diverse, and functionally relevant protein structures by starting with pure random noise and systematically removing the distortions.

RFdiffusion exemplifies this approach:

Training process:

learn to gradually add noise to known protein structures from the Protein Data Bank PDB
continue until they become random arrangements of atoms
train a neural network to reverse this process
generate new proteins by starting with noise and applying denoising

Key advantages:

generate diverse, novel structures
maintain physical constraints for proper folding
create protein topologies never observed in nature
enable functions that natural evolution never explored

This approach breaks free from nature’s limitations. Traditional methods modify existing proteins, but diffusion models design completely novel binding interfaces and functional sites. These artificial proteins can perform functions that natural evolution never discovered.

Protein Language Models: pattern recognition

Protein language models are trained on existing protein data from databases. These models can generate new sequences with functions similar to those in the training set. They treat protein sequences like text and learn the “grammar” of protein folding from millions of natural sequences.

Core capabilities:

identify sequence patterns that correlate with specific functions
generate novel sequences following natural folding rules
predict functional variants of existing proteins
enable rapid exploration of sequence space around known structures

These models excel at capturing the statistical regularities of natural protein sequences while enabling creative combinations that maintain biological plausibility.

Physical and energy-based approaches

Rosetta Software: physics-based optimization

Rosetta is a leading platform that uses physically-based energy functions to identify amino acid sequences. It targets sequences that lead to stable, low-energy structures and desired functions. Despite AI successes, physics-based approaches like Rosetta remain crucial components of modern design pipelines.

Rosetta strengths:

Van der Waals interactions modeling
Hydrogen bonding calculations
Electrostatic effects prediction
Solvation energy optimization

Practical applications:

Optimize designs for stability and function simultaneously
Fine-tune binding sites for a particular target protein
Adjust protein stability for different environments
Incorporate non-natural amino acids for specialized applications

Modern workflows often combine AI generation with Rosetta optimization. An AI model generates an initial design. Rosetta then refines the design to optimize specific interactions and ensure thermodynamic stability.

Backbone sampling and sequence optimization

These methods explore possible protein backbone structures and then optimize the amino acid sequence to stabilize the chosen backbone. The approach typically involves:

Backbone exploration:

Sample conformational space systematically
Identify architectures capable of the desired functions
Filter based on geometric and chemical constraints

Sequence optimization:

Design amino acid sequences for selected backbones
Balance stability requirements with functional needs
Iterate between structure and sequence refinement

This systematic approach ensures that both structural and sequence elements contribute to overall design success.

Application-driven strategies

Motif scaffolding: function-first design

Motif scaffolding is a method where a functional motif is designed first. A surrounding protein scaffold is then built to stabilize and enhance the motif’s function. This approach proves particularly powerful for creating proteins with predetermined binding sites or catalytic activities.

Design workflow:

Identify or design the essential functional motif
Create surrounding scaffold architecture
Optimize scaffold-motif interactions
Ensure overall structural stability

Applications:

Enzyme active site design
Specific binding site creation
Regulatory domain incorporation

Functional site design: target-specific engineering

Functional site design involves creating a protein to specifically interact with a target, such as binding to a virus or a cell receptor, by creating a tailored binding site within the new protein. This approach enables precise molecular recognition capabilities.

Key considerations:

Target molecule geometry and chemical properties
Binding interface complementarity
Specificity requirements to avoid off-target effects
Integration with overall protein stability

Success examples:

Viral inhibitor design with novel binding modes
Receptor agonists or antagonists with improved selectivity
Diagnostic proteins for specific biomarker detection

Comparison of modern approaches

Method	Primary Strength	Typical Application	Success Rate
ProteinMPNN	Sequence optimization for given backbones	Stabilizing designed structures	>50% sequence recovery
RFdiffusion	Novel backbone generation	Creating new protein folds	~80% for symmetric assemblies
Rosetta	Physical accuracy and functional optimization	Fine-tuning binding sites	Variable by application
Protein Language Models	Natural sequence patterns	Generating functional variants	~30-60% functional designs
Motif Scaffolding	Function-first design approach	Enzyme and binding site creation	High for well-defined motifs
Functional Site Design	Target-specific optimization	Therapeutic and diagnostic proteins	Depends on target complexity

Key Takeaway
Different methods solve different pieces of the puzzle—backbones, sequences, stability, or function. The most powerful designs come from combining them into a single pipeline.

Case study: RFdiffusion and generative AI for proteins

The experimental validation results for RFdiffusion demonstrate the practical impact of this breakthrough approach. David Baker’s team at the University of Washington has achieved remarkable success rates that translate directly to commercial applications.

The diffusion process explained

The core innovation lies in adapting diffusion models to three-dimensional protein structures. RFdiffusion creates these structures by reversing the gradual corruption of known protein geometries. This mirrors how image diffusion models generate pictures by reversing a noising process.

Training workflow:

RFdiffusion takes known protein structures from the protein data bank (PDB)
The system progressively adds random noise to atomic coordinates
The process continues until the original structure becomes indistinguishable from random noise
The neural network trains itself to predict how to reverse each corruption step
The model learns to “denoise” random atomic arrangements into valid protein structures

The trained model generates entirely new proteins by starting with pure noise. It then applies the learned denoising process step by step.

This approach enables the creation of protein architectures that may never have existed in nature. Yet it maintains the physical constraints necessary for proper folding.

Technical implementation details

RFdiffusion processes protein structures as graphs. Amino acid residues represent nodes, and spatial relationships define edges. The model uses transformer-like architectures to update these representations iteratively.

Key features:

еach iteration refines the protein structure
gradually transforms random coordinates into chemically plausible geometries
can incorporate various constraints during generation
guides generation while allowing creative freedom

Constraint capabilities:

specify binding sites that must interact with a particular target protein
enforce specific symmetries for protein complexes
constrain certain regions to adopt particular secondary structures
design functional site requirements into the generation process

Experimental validation results

Diagram of the process for designing new proteins using RFdiffusion and artificial intelligence.

The practical success of RFdiffusion has been demonstrated through extensive laboratory testing. In one landmark study, researchers designed symmetric protein assemblies using RFdiffusion. They then synthesized and tested these designs in the laboratory.

Performance metrics:

success rate exceeded 80% for symmetric assemblies
designed proteins folded into structures matching computational predictions
experimental error margins within acceptable ranges
functional validation confirmed binding activities

Perhaps more impressively, RFdiffusion has enabled the design of protein binder molecules for specific targets. The system can generate proteins with binding sites complementary to particular ligands or other proteins.

Experimental validation shows that these designed binders often achieve affinities comparable to natural antibodies. But with entirely artificial sequences and structures.

Key Takeaway
Diffusion models can “grow” proteins from random noise, much like AI image generation tools. Lab tests show they reliably produce stable, functional designs.

What problems does de novo protein design solve better than traditional approaches?

Problem area	Traditional approach (The old way)	De novo design (The AI way)
Undruggable targets	Needs a pocket. Small-molecule drugs rely on finding a deep groove or pocket to sit in. If the target is flat or slippery, the drug slides off.	Creates a custom grip. AI builds proteins with custom shapes that can wrap around flat surfaces or clamp onto targets that lack deep pockets.
Protein-protein interactions	Too big and clumsy. We often use antibodies to block interactions. They are effective but massive and struggle to get inside cells or tissues.	Tiny and stable. AI designs “mini-binders” – tiny, ultra-stable wedges that fit into tight spaces where large antibodies simply cannot fit.
Novel enzyme reactions	Tweaking nature. Directed evolution takes an existing enzyme and improves it.	Inventing new chemistry. AI designs brand-new active sites from scratch.
Biomaterials	Inconsistent. We use natural materials like silk or collagen. They are useful but vary in quality and are hard to modify.	Tunable and precise. AI treats proteins like LEGO bricks. We can precisely tune stiffness, pore size, and heat resistance.

Applications of de novo protein design

The practical applications span multiple industries. Each sector leverages designer proteins to solve previously intractable problems. The ability to create proteins with predetermined functions opens new possibilities across therapeutics, materials science, and biotechnology.

Drug discovery breakthroughs

Pharmaceutical applications represent the most immediate and high-impact area. Traditional drug discovery faces significant limitations when targeting “undruggable” proteins. These binding sites are too shallow, too hydrophobic, or too dynamic for conventional small-molecule drugs.

Designer proteins solve this problem by creating large, stable binding interfaces. These can engage challenging targets that small molecules cannot reach.

Real-world example: 4-1BB receptor targeting

The 4-1BB receptor exemplifies this approach. This immune checkpoint protein plays crucial roles in T-cell activation but proved difficult to target with conventional drugs. Recent research demonstrated the successful design of nanobody binders specifically targeting 4-1BB epitopes (Poddiakov et al., 2025).

Key achievements:

computational predictions achieved binding scores comparable to natural antibodies
novel binding interfaces created through de novo design
entirely artificial sequences with no natural templates
potential for improved therapeutic profiles

Antimicrobial peptide development

The antimicrobial peptide field showcases another major pharmaceutical application. AMPGen, an evolutionary information-reserved diffusion model, has generated antimicrobial peptides with impressive results:

81.58% positive rates in experimental validation
broad-spectrum activity against both Gram-positive and Gram-negative bacteria
novel approaches to combat antibiotic resistance
sequences absent from existing AMP databases (Jin et al., 2025)

Protein therapeutics advantages

Protein therapeutics benefit significantly from de novo design approaches:

create entirely new molecules optimized for specific therapeutic goals
improved pharmacokinetics compared to natural alternatives
reduced immunogenicity through careful sequence design
enhanced tissue specificity for targeted delivery

De novo protein design enables personalized therapeutics tailored to individual patient genetic profiles.

Biomaterials innovation

Material science applications leverage the precise control over protein structure that de novo design provides. Natural structural proteins like collagen or spider silk have valuable properties. Their complex production requirements, however, limit practical applications.

Designer proteins can replicate these properties while enabling production in simple bacterial systems.

Self-assembling protein materials

Self-assembling protein materials represent a particularly exciting application:

proteins with specific interaction interfaces
materials that spontaneously organize into desired structures
nanofibers, hydrogels, or rigid frameworks
responsive elements that change properties with environmental conditions

Tunable material properties

The precision of protein design enables the creation of materials with properties tuned for specific applications. Those including controlled stiffness for different mechanical requirements and biodegradability timelines for medical applications. These designed proteins can also feature specific biocompatibility profiles for tissue engineering and tailored surface properties for specific cellular interactions.

Tissue engineering applications

The materials find applications in tissue engineering, where scaffolds provide appropriate mechanical support and allow cellular integration and growth. They enable eventual replacement by natural tissue through controlled degradation as healing progresses.

Enzyme engineering advances

Catalytic applications represent the most challenging aspect of de novo protein design. These systems must not only bind substrates but also facilitate specific chemical transformations. Successful enzyme designs depend on positioning amino acid residues with atomic precision to stabilize transition states and enable bond formation or cleavage.

Design challenges:

stabilize transition states for chemical reactions
facilitate bond breaking and forming
position catalytic residues with atomic precision
enable substrate binding and product release

Non-natural reaction capabilities

Success stories include the design of enzymes for non-natural reactions:

chemical transformations that don’t occur in biological systems
industrial importance for manufacturing processes
reactions impossible with traditional chemical catalysts
green chemistry applications with environmental benefits

Industrial applications

Industrial applications drive much interest in de novo enzyme design:

operate under conditions that denature natural enzymes
high temperatures for industrial processes
extreme pH levels for specialized applications
presence of organic solvents for chemical synthesis
enable enzymatic processes where traditional biocatalysis isn’t feasible

Synthetic biology integration

Synthetic biology applications view proteins as components in engineered biological systems. Rather than designing individual proteins, researchers create networks of interacting proteins that implement complex cellular functions.

Protein-based logic circuits

Protein-based logic circuits exemplify this approach:

specific binding interactions create biological switches
amplifiers and memory devices built from protein components
control gene expression through programmable interactions
direct cellular behavior via engineered protein networks
respond to environmental signals in predetermined ways

Biosensor applications

Biosensor applications combine binding specificity with signal transduction:

Designer proteins detect specific molecules
Convert presence into detectable outputs
Fluorescence, enzymatic activity, or cellular behavior changes
Monitor environmental conditions or disease biomarkers
Process variables in biotechnology applications

Multi-functional system construction

The modular nature of protein domains enables sophisticated systems:

combine binding domains, catalytic domains, and regulatory elements
novel arrangements create proteins with complex behaviors
programmable behaviors impossible with natural biology
foundation for engineered biological systems

Related Article: Clinical Trial Phases: The Full Guide

Understanding clinical development pathways becomes crucial as de novo designed therapeutics enter human testing.

Abstract visualization of protein chains and molecular structures.

Challenges and limitations

Despite remarkable progress, de novo protein design faces persistent challenges. These limitations span technical, experimental, and practical domains. Continued research and development are required to overcome them.

Computational challenges

The protein folding problem remains incompletely solved

AI models like AlphaFold can predict structures for many natural proteins with high accuracy. They still often struggle with de novo designs. Novel sequence patterns or unusual structural features present difficulties.

Current design methods excel at creating proteins that resemble natural structures. They face difficulties when venturing into genuinely unexplored sequence space.

Model limitations:

learn patterns from natural proteins
may not encompass all possible functional architectures
constrains design creativity
may miss opportunities for capabilities beyond nature

Computational cost barriers

High-quality protein design requires extensive sampling:

thousands of design iterations
complex optimization procedures
substantial computational resources required
limited accessibility for many research groups
industrial applications face resource constraints

Experimental validation bottlenecks

The gap between computational prediction and experimental reality continues to challenge the field. Success rates have improved dramatically but remain well below 100%. Even the best computational designs sometimes fail in the laboratory.

Protein expression challenges:

many designs cannot be produced in bacterial expression systems
aggregation, misfolding, or toxicity issues
expression system choice dramatically affects success rates
optimizing expression conditions requires extensive trial-and-error

Functional validation complexities:

binding assays may not capture all performance aspects
enzymatic activity measurements have limitations
stability tests may miss important factors
laboratory conditions may not reflect real applications
complex environments where proteins must function

Design scope limitations

Current AI-based protein design methods work best with single-domain proteins that have well-defined structures. They struggle with multi-domain proteins, membrane proteins, and dynamic systems that need conformational flexibility. While binding site design has seen notable successes, catalytic site design remains difficult. This is due to the precise geometric requirements and the inherently dynamic nature of enzymatic reactions.

Designed proteins must function within complex cellular networks, not in isolation. They face unexpected interactions with cellular components and may be affected by post-translational modifications. The cellular environment can interfere with normal processes and alter protein behavior unpredictably.

Regulatory and safety considerations

The pharmaceutical industry faces regulatory hurdles because agencies have extensive experience with modified natural proteins but limited precedent for entirely artificial therapeutics. This lack of evolutionary precedent may require more extensive safety testing and longer approval pathways.

Complex intellectual property landscapes around AI-designed proteins complicate commercial viability. The relationships between design methods, resulting sequences, and functional properties create intricate patent situations. Legal frameworks for AI-designed proteins continue to evolve, potentially influencing both design strategies and commercial development.

Key Takeaway
Computers can design faster than biology can prove. Costs, lab bottlenecks, and regulatory uncertainty still limit how quickly designs reach real-world use.

Future of de novo protein design

Graphic representation of a neural network for protein engineering applications.

The trajectory points toward increasingly sophisticated capabilities and broader practical applications. Several emerging trends will shape the field’s development over the next decade.

Integration of multi-modal AI

Future design platforms will integrate multiple types of AI models. Rather than using separate tools for structure generation, sequence optimization, and functional prediction, unified models will optimize all factors together.

Advantages of integration:

More sophisticated design objectives
Balance multiple constraints simultaneously
Stability, expression, immunogenicity, pharmacokinetics optimization
Functional performance optimization in a single workflow
Multi-objective optimization reflects practical requirements

Experimental data integration:

Current models rely primarily on structural databases
Future approaches will incorporate functional data
Binding measurements and stability information included
Experimental integration bridges the gap between prediction and reality
Improved design accuracy through comprehensive training data

Discover how we can help outsource Healthcare projects efficiently

Speak to an expert today, and see how our on-demand IT talent and augmented teams can efficiently deliver value at every step of your roadmap.

Get a free consultation See our case studies

Expansion to FF

The field will move beyond single-domain proteins toward complex multi-protein systems. This expansion involves advances in multi-protein interface design and dynamic protein design.

Multi-protein interface design will enable the creation of protein machines and molecular systems with moving parts. These systems will include:

biosensors with multiple input signals
catalytic systems with regulatory controls
Coordinated protein complexes that perform complex functions

Dynamic protein design will incorporate conformational flexibility as a design parameter. This approach creates proteins that switch between different structures in response to binding partners or environmental conditions. Such responsive systems enable sophisticated regulatory mechanisms and therapeutic systems with programmable activity.

Complex system applications can include protein complexes with coordinated functions and membrane proteins designed for lipid environments. The field may also develop dynamic conformational ensembles and multi-state systems with switching capabilities.

Industrial-scale implementation

The maturation of design methods will drive adoption in industrial settings:

Pharmaceutical integration:

companies integrate de novo design into drug discovery pipelines
designed proteins as research tools and therapeutic candidates
novel drug targets accessible through designer proteins
personalized therapeutics through patient-specific designs

Biotechnology applications:

designed enzymes for industrial processes expand
proteins for biomaterial production scale-up
components for synthetic biology systems commercialize
success rates improve and costs decrease

Automated design platforms:

cloud-based services democratize access
researchers without computational resources can participate
innovation accelerates across biotechnology sector
reduced barriers to entry for protein design

Regulatory framework evolution

Regulatory agencies will likely develop specific guidelines for evaluating de novo designed proteins. These guidelines will need to balance innovation encouragement with safety assurance.

For therapeutic applications, regulatory frameworks will establish standards for computational validation and define experimental characterization requirements. Key elements will include:

standards for computational validation
experimental characterization requirements
optimized clinical evaluation pathways

Early approved designed protein therapeutics could establish important precedents that influence future regulatory approaches. Successful examples could demonstrate safety and efficacy potential, facilitating broader acceptance of the technology.

International harmonization will create consistent standards across major markets and reduce development costs through unified requirements. This coordination will accelerate time-to-market for innovative treatments and promote global collaboration on regulatory frameworks.

Scientific and technological convergence

De novo protein designs are bound to intersect with other emerging technologies:

Synthetic biology integration:

Design of complete biological systems
Networks of interacting proteins
Cellular functions programmed through protein design
Biological circuits with predictable behaviors

Nanotechnology combination:

Hybrid systems merge biological and synthetic components
Enhanced functionality through material integration
Novel applications impossible with either technology alone
Precision assembly of complex functional systems

Advanced computational methods:

machine learning advances beyond protein design contribute
improved optimization algorithms
better uncertainty quantification
more efficient computational methods
enhanced design accuracy and reduced costs

These developments collectively point toward a future where de novo protein design becomes routine. The combination of improved computational methods, expanded experimental capabilities, and supportive regulatory frameworks will enable applications that seem ambitious today.

As protein design tools mature, healthcare organizations will need strategies for integrating them with existing research infrastructure.

BGO Software helps teams turn de novo protein design into production software with secure platforms that integrate generative models and lab data. If you plan a binder or enzyme program, our engineers can map the workflow and deliver an audited stack that fits your LIMS and QA rules. Contact us to learn more!

Frequently Asked Questions (FAQ)

What is a de novo protein design?

De novo protein design is the computational creation of entirely new proteins from scratch, without using existing natural proteins as templates. These designer proteins are built using first principles of protein folding and function to achieve specific, predetermined capabilities.

What are de novo sequencing methods for proteins?

De novo sequencing methods determine protein sequences without prior knowledge of the protein’s identity, typically using mass spectrometry fragmentation patterns. In protein design contexts, these methods help validate that designed proteins fold into their intended structures and maintain their predicted sequences.

What is de novo synthesis of proteins?

De novo synthesis refers to the laboratory production of designed proteins using artificial gene synthesis and expression systems. The process involves converting computational protein designs into physical molecules through DNA synthesis, cloning, and protein expression in bacterial or other cellular hosts.

What is the de novo method?

The de novo method represents a computational approach that creates molecular designs from first principles rather than modifying existing structures. In protein design, this method uses physics-based energy functions and AI models to generate novel protein sequences and structures that have never existed in nature.

How successful are current de novo protein design methods?

Modern AI-driven methods achieve success rates above 80% for certain applications like symmetric protein assemblies, representing a dramatic improvement over traditional methods that typically achieved less than 10% success rates. However, success varies significantly depending on the complexity of the design target and functional requirements.

Resources

Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., … & Baker, D. (2022). Robust deep learning-based protein sequence design using ProteinMPNN. Science, 378(6615), 49-56.
Jin, S., Zeng, Z., Xiong, X., Huang, B., Tang, L., Wang, H., … & Lin, F. (2025). AMPGen: an evolutionary information-reserved and diffusion-driven generative model for de novo design of antimicrobial peptides. Communications Biology, 8(1), 839.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
Poddiakov, I., Umerenkov, D., Shulcheva, I., Golovina, V., Borisova, V., Pozdnyakova-Filatova, I., … & Blinov, P. (2025). An iterative strategy to design 4-1BB agonist nanobodies de novo with generative AI models. Scientific Reports, 15(1), 25412.
Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., … & Baker, D. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089-1100.
Yao, J., & Wang, X. (2025). Artificial intelligence in de novo protein design. Medicine in Novel Technology and Devices, 26, 100366.

Yoanna Stefanova

Yoanna is a Technical Copywriter with a keen interest in healthcare innovations and medicine. She is dedicated to crafting clear and engaging content that highlights the latest advancements and trends in the medical field.

link to the author’s linkedin profile