Growing Transformers: A New Path for LLM Development

New research proposes a modular, resource-efficient way to scale large language models.

A new paper introduces a 'growing Transformers' method for building large language models (LLMs). This approach uses modular composition and layer-wise expansion on a frozen substrate. It promises more resource-efficient scaling and continual learning for AI systems.

By Katie Rowan

November 5, 2025

4 min read

Growing Transformers: A New Path for LLM Development

Key Facts

The research introduces a 'growing Transformers' method for scaling large language models (LLMs).
This method uses modular composition and layer-wise expansion on a frozen substrate.
It aims to be more resource-efficient than traditional monolithic LLM training.
The constructively grown model achieved performance comparable to a monolithically trained baseline of the same size.
The authors released code and models to encourage further research.

Why You Care

Ever wonder why training massive AI models costs so much and takes forever? What if there was a better way to build AI systems, one that was far more efficient? New research suggests a novel approach to developing large language models (LLMs). This could dramatically change how your favorite AI tools are created and updated.

This method, called ‘growing Transformers,’ focuses on modular composition. It allows for layer-wise expansion on a frozen substrate, as detailed in the blog post. This means AI models could grow incrementally, saving significant resources. This creation directly impacts the future of AI creation and its accessibility for everyone.

What Actually Happened

A paper titled “Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate” was recently published. It introduces an alternative to the traditional, resource-intensive method of training large language models. The prevailing paradigm involves monolithic, end-to-end training, according to the announcement. This new method proposes a constructive scaling paradigm instead.

The core idea revolves around emergent semantics in Transformers. This is where high-level meaning comes from deep layers, not just input embeddings. The research team posits that the embedding layer and trained lower layers can serve as a fixed foundation. This allows new components to be added and trained more efficiently. This strategy leverages strict layer freezing in early stages, combined with efficient fine-tuning using Low-Rank Adaptation (LoRA).

Why This Matters to You

This new approach offers several compelling benefits. It could make AI creation more accessible and sustainable. Imagine a future where smaller teams can build and adapt AI models. This would reduce the current reliance on massive computing power and budgets.

Think of it as building a complex LEGO structure. Instead of tearing down and rebuilding the whole thing every time you want to add a new section, you simply attach new pieces to the existing, stable base. This is similar to how the ‘growing Transformers’ method works.

Key Advantages of Growing Transformers:

Feature	Traditional LLM Training	Growing Transformers Method
Resource Use	Very High	Significantly Lower
Flexibility	Low	High
Growth Method	Monolithic	Incremental, Modular
Continual Learning	Challenging	More Viable

This method demonstrates stable convergence, the research shows. What’s more, it reveals a direct correlation between model depth and the emergence of complex reasoning abilities. These abilities are often absent in shallower models, as mentioned in the release. What kind of new AI applications could this more flexible creation unlock for your business or creative projects?

The Surprising Finding

Here’s the really interesting part: the constructively grown model rivals the performance of a monolithically trained baseline of the same size. This is quite surprising, considering the traditional method is so resource-intensive. The study finds this validates the efficiency and efficacy of the new approach.

This challenges the common assumption that bigger, all-at-once training is always better. It suggests that smart, modular growth can achieve similar results with fewer resources. This finding opens a path for a more biological or constructive model of AI creation. It moves away from monolithic optimization, the paper states. This could mean a future where AI evolves more like living organisms, growing and adapting over time.

What Happens Next

The implications of this research are significant for the AI industry. This method suggests a path towards more resource-efficient scaling. It also enables continual learning and a more modular approach to building AI systems. The authors have released all code and models to facilitate further research, according to the announcement.

We could see initial applications of this modular growth strategy within the next 12-18 months. For example, imagine an AI assistant that learns new skills or languages by simply adding a new ‘layer’ without retraining its entire core. This would make updates much faster and cheaper. Developers should explore integrating modular design principles into their current AI projects. This will prepare for a future of more agile AI creation. This shift could democratize access to AI capabilities.

Ready to start creating?