ElevenLabs CEO Predicts AI Audio Model Commoditization

Mati Staniszewski discusses the future of AI audio, emphasizing a shift towards multi-modal applications.

ElevenLabs CEO Mati Staniszewski believes AI audio models will become commoditized within a few years. He revealed this at TechCrunch Disrupt 2025, explaining his company's strategy to focus on both model building and applications.

Sarah Kline

By Sarah Kline

October 30, 2025

4 min read

ElevenLabs CEO Predicts AI Audio Model Commoditization

Key Facts

  • ElevenLabs CEO Mati Staniszewski predicts AI audio models will become commoditized in the next few years.
  • Staniszewski made this announcement at TechCrunch Disrupt 2025.
  • ElevenLabs will continue building its own models in the short term for competitive advantage.
  • The company expects a shift towards multi-modal (audio, video, LLM) approaches within 1-2 years.
  • ElevenLabs plans to pursue partnerships and open-source collaborations.

Why You Care

Ever wonder if the AI voices you hear today will soon be indistinguishable and freely available? Mati Staniszewski, CEO of ElevenLabs, just dropped a bombshell. He predicts that AI audio models will become a commodity in the near future. What does this mean for your creative projects or your business? It suggests a future where high-quality AI audio is no longer a , but a standard expectation. This shift could profoundly impact how you create and consume digital content.

What Actually Happened

ElevenLabs CEO Mati Staniszewski shared his long-term vision for the AI audio space. Speaking at TechCrunch change 2025, Staniszewski stated that AI audio models are headed for commoditization. This means the core system will become widely available and less differentiated over time, according to the announcement. He explained that ElevenLabs researchers have tackled significant model architecture challenges. This focus will continue for the next year or two, as the team revealed. However, the landscape is set to change significantly.

Staniszewski clarified why ElevenLabs continues to build models despite this prediction. He noted that in the short term, building models offers the “biggest advantage and the biggest step change you can have today.” He also highlighted the need to solve problems like poor-sounding AI voices. The only way to solve this, he believes, is by building the models yourself. Other players will eventually solve these issues too, as he mentioned in the release.

Why This Matters to You

This forecast from ElevenLabs has direct implications for anyone working with AI audio. If core AI audio models become commoditized, your focus might shift. You could concentrate more on unique applications rather than foundational model creation. Imagine you’re a podcaster. Instead of worrying about voice quality, you could focus on compelling scripts and delivery methods. This could free up resources and spark new creative avenues for your content.

Staniszewski also pointed to an increasing trend towards multi-modal or fused approaches. “So, you will create audio and video at the same time, or audio and LLMs at the same time in a conversational setting,” he stated. This means combining different AI technologies. Think of it as creating a complete digital persona, not just a voice. For example, Google’s Veo 3 shows what’s possible when models are combined. This creation could open up entirely new ways for you to engage your audience.

Key Implications of AI Audio Commoditization:

  • Lower Barrier to Entry: Easier access to high-quality AI voices for everyone.
  • Increased creation in Applications: Focus shifts from model building to creative use cases.
  • Multi-modal Content Creation: Integration of audio with video and large language models (LLMs).
  • Enhanced User Experience: More natural and engaging AI interactions.

How will you adapt your content strategy when high-quality AI voices are readily available to everyone?

The Surprising Finding

Here’s the twist: even with the prediction of commoditization, ElevenLabs plans to continue building its own models. This might seem counterintuitive at first glance. Why invest heavily in something that will eventually become commonplace? The reason, as Staniszewski explained, is short-term competitive advantage. He emphasized that building models themselves is currently the most significant advantage. It allows them to tackle current problems like AI voices that “don’t sound good.” This challenges the assumption that companies would immediately pivot away from core system creation once commoditization is on the horizon. Instead, it suggests a strategic window where proprietary models still offer a crucial edge.

What Happens Next

ElevenLabs aims to launch partnerships with other companies in the coming months. They also plan to work with open-source technologies, the team revealed. This strategy seeks to combine their audio expertise with other model capabilities. Staniszewski expects an increasing number of multi-modal approaches within the next year or two. For example, we could see more AI systems that generate both audio and video simultaneously. This will lead to richer, more integrated digital experiences.

For creators and businesses, this means preparing for a more integrated AI landscape. Consider how your projects can benefit from combining AI audio with other AI modalities. Start exploring tools that offer multi-modal capabilities. The company reports that ElevenLabs’ long-term goal is to focus on both model building and applications. This dual approach aims to create lasting value. Staniszewski likened this to Apple’s success with software and hardware. He believes “the product and AI will be the magic for the generation of the best use cases.”

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice