Familial Models Redefine AI Scaling Laws

New research introduces 'Granularity' as a key factor for efficient AI deployment.

A new study by Huan Song and colleagues challenges traditional AI scaling laws. They propose 'familial models' that generate multiple sub-models from a single backbone. This approach could make AI deployment more flexible and efficient across various devices.

By Sarah Kline

December 31, 2025

4 min read

Familial Models Redefine AI Scaling Laws

Key Facts

New research introduces 'Familial models' for AI deployment.
Familial models generate multiple deployable sub-models from a single backbone.
Granularity (G) is added as a fundamental scaling variable alongside model size (N) and training tokens (D).
The granularity penalty follows a multiplicative power law with an extremely small exponent.
The approach validates the 'train once, deploy many' paradigm for AI.

Why You Care

Ever wonder why deploying AI models on smaller devices feels like fitting a supercomputer into your pocket? This challenge is real. New research on familial models could change everything. It promises to make AI more accessible. This means AI could soon run on your phone or smart home devices. What if you could have bespoke AI without needing massive computing power for each application?

What Actually Happened

Researchers, including Huan Song and Qingfei Zhao, have unveiled a new perspective on AI scaling. Their work, as detailed in the blog post, introduces familial models. These models are a “impactful paradigm essential for realizing ubiquitous intelligence across heterogeneous device-edge-cloud hierarchies.” Unlike traditional large language models (LLMs) that produce a single output, familial models create multiple deployable sub-models. They achieve this from a single shared backbone, using a process called relay-style inference.

The team revealed they extended the traditional scaling law. They added ‘Granularity’ (G) as a fundamental scaling variable. This variable joins model size (N) and training tokens (D). According to the announcement, they proposed a unified functional form L(N, D, G). They then parameterized it using large-scale empirical runs. This rigorous approach helped them understand the relationship between these factors. It promises more efficient AI creation and deployment.

Why This Matters to You

This new approach has significant implications for how we build and use AI. Imagine you’re a developer. You want to deploy a AI on a wide range of devices. These devices might include everything from cloud servers to tiny edge devices. This new research suggests you won’t need to retrain a separate model for each. You could train one familial model. Then, you could generate specialized sub-models for different needs. This saves immense time and computational resources.

For example, think of a smart home assistant. Currently, a full AI model might run in the cloud. A familial model could allow a smaller, specialized version to run directly on your smart speaker. This would improve response times and privacy. It reduces reliance on constant internet connections. How might this flexibility change the way you interact with AI in your daily life?

The research shows that this deployment flexibility is achievable. It does not compromise the compute-optimality of dense baselines. This means you get the benefits of tailored AI without sacrificing performance. As mentioned in the release, the concept validates the “train once, deploy many” paradigm. This is crucial for expanding AI’s reach.

Here’s a quick look at the expanded scaling law:

Variable	Description	Impact on AI Deployment
N	Model Size (number of parameters)	Determines model complexity and capacity
D	Training Tokens (amount of data)	Influences model knowledge and generalization
G	Granularity (number of sub-models generated)	Enables tailored deployment across diverse hardware

The Surprising Finding

One of the most intriguing findings from this research concerns the ‘granularity penalty.’ You might expect generating multiple sub-models from one backbone to be computationally expensive. However, the study finds something quite different. The granularity penalty follows a multiplicative power law. Even more surprising, it has an “extremely small exponent.” This means the computational cost of creating these specialized sub-models is minimal. It’s far less than one might intuitively assume.

This challenges the common assumption that increased flexibility always comes with a significant performance hit. The team revealed that this bridges fixed-compute training with dynamic architectures. It means AI developers can achieve deployment flexibility. They can do this without compromising the compute-optimality of dense baselines. This finding suggests a highly efficient path forward for ubiquitous AI.

What Happens Next

This research paves the way for a new era of AI deployment. We could see these familial models implemented in practical applications soon. Over the next 12-18 months, expect to see more academic papers. These will explore the practical implications and optimizations of this approach. What’s more, companies developing AI for edge devices might integrate these concepts. This could happen within the next two to three years.

For example, imagine a self-driving car. It could use a familial model. A large, complex sub-model handles high-level navigation. Smaller, more efficient sub-models manage real-time sensor processing. They would run on dedicated, less chips. For you, this means more and reliable AI systems. These systems are adaptable to various environments.

Developers should begin exploring how these concepts could apply to their projects. Consider how your current AI deployment strategies might evolve. This approach offers a path to more efficient and flexible AI solutions. The team revealed that this validates the “train once, deploy many” paradigm. This will likely become a key focus in future AI creation.

Ready to start creating?