Why You Care
Ever wonder if the AI tools you use today will become commonplace tomorrow? What if the very system powering your favorite audio experiences is on a fast track to becoming a commodity? This isn’t just a hypothetical question for AI developers. It’s a direct prediction from a leader in the field, and it has significant implications for your creative projects and business strategies.
ElevenLabs CEO Mati Staniszewski recently shared his insights at TechCrunch change 2025. He revealed that AI audio models, currently a hot commodity, are expected to commoditize over time. This shift could change how you approach AI audio integration and creation.
What Actually Happened
Speaking at the TechCrunch change 2025 conference, Mati Staniszewski, the founder of AI audio company ElevenLabs, offered a candid view on the future of AI audio models. According to the announcement, Staniszewski discussed both short-term and long-term perspectives. He highlighted that ElevenLabs’ researchers have successfully tackled several model architecture challenges. This focus on model building will continue for the next year or two, as detailed in the blog post.
Staniszewski stated, “Over the long term, it will commoditize — over the next couple of years.” He further explained that while differences might persist for specific voices or languages, these distinctions will generally become smaller. When questioned about ElevenLabs’ continued focus on model building despite this prediction, Staniszewski clarified that in the short term, proprietary models represent the “biggest advantage and the biggest step change you can have today.”
Why This Matters to You
This prediction isn’t just industry jargon; it directly impacts your work. If AI audio models become widely available and affordable, it opens up new possibilities for creators and businesses. Imagine having access to voice generation without the hefty price tag or complex creation. This could democratize high-quality audio content creation.
For example, consider a small podcast studio. Currently, achieving diverse and natural-sounding AI voices might require significant investment. However, with commoditization, that same studio could access a wide array of high-quality voice options at a fraction of the cost, leveling the playing field. How will you adapt your content strategy when AI audio becomes universally accessible?
Staniszewski emphasized the importance of solving current audio quality issues. He noted, “The only way to solve it is… building the models yourself, and then, over the long term, there will be other players that will solve that, too.” This indicates a temporary competitive edge for companies investing in core model creation now.
Potential Impact of AI Audio Commoditization:
- Increased Accessibility: More affordable and readily available tools for everyone.
- Reduced creation Costs: Lower barriers to entry for integrating AI audio into applications.
- Focus on Applications: Shift from model creation to use cases and product creation.
- Enhanced Quality Standards: Baseline quality of AI audio is likely to improve across the board.
The Surprising Finding
The most intriguing revelation from Staniszewski is that ElevenLabs, despite foreseeing the commoditization of AI audio models, plans to continue investing heavily in building these very models. This might seem counterintuitive at first glance. Why pour resources into something you expect to become a basic utility?
Staniszewski’s reasoning challenges the common assumption that companies should only invest in long-term differentiators. He explains that for the future, proprietary models are still crucial. The company reports, “The only way to solve it is… building the models yourself.” This strategy focuses on addressing current technical challenges that other players haven’t yet mastered. It’s about securing a short-term lead to fund long-term creation.
This approach suggests a two-pronged strategy. While the core system may eventually become widespread, the advantage lies in solving today’s complex problems. It’s a race to the foundational system before it becomes a standard feature, allowing ElevenLabs to pivot to higher-value applications.
What Happens Next
Looking ahead, Staniszewski predicts a significant move towards multi-modal or fused AI approaches within the next year or two. This means combining different AI models to create more complex and integrated experiences. For instance, you might see AI systems that generate audio and video simultaneously, or audio integrated directly with large language models (LLMs) for conversational settings.
Think of it as the next evolution of AI interaction. For example, imagine a virtual assistant that not only understands your spoken commands but also generates a realistic video response. The team revealed that Google’s Veo 3 serves as an excellent example of what is achievable when models are combined effectively. ElevenLabs intends to form partnerships with other companies and utilize open-source technologies, as mentioned in the release. Their goal is to merge their audio expertise with the capabilities of other models.
For your business, this means preparing for a future where AI interactions are richer and more . Consider how your products could benefit from integrated audio-visual or conversational AI. The company reports that ElevenLabs’ long-term strategy involves focusing on both model building and applications. This dual focus aims to create lasting value, much like the software and hardware integration that defined Apple’s success, according to Staniszewski.
