Why You Care
Ever wonder what your speech truly reveals, beyond just the words you say? Imagine a world where every single vocal nuance, every pause, and yes, even every “um” and “uh” is meticulously recorded. Why should you care about your vocal disfluencies? Because, according to the announcement, Deepgram’s new feature is changing how we understand spoken communication, offering detail for various applications.
Deepgram has introduced its new Filler Words feature. This allows for verbatim transcription, capturing those often-edited-out vocal habits. This creation isn’t just for linguists; it impacts anyone who relies on accurate speech-to-text system. Your ability to analyze speech, whether for coaching or record-keeping, just got a significant upgrade.
What Actually Happened
Deepgram recently announced the release of its new Filler Words feature, as detailed in the blog post. This capability allows their speech-to-text model to transcribe vocal disfluencies, also known as filler words. These include sounds or words like “um” and “uh,” which are common in everyday conversation. Traditionally, these are often removed from transcripts to create “clean copy.”
However, the company reports that some customers require the inclusion of every encountered utterance. This is especially true for those focused on improving public speaking or maintaining official records. The new Filler Words feature works for both pre-recorded and streaming English audio. It is compatible with existing features such as Smart Formatting and Diarization (speaker separation). Initially, it’s available with their Nova general speech-to-text model, with plans for broader model support soon. The team revealed that this feature has no impact on latency or performance, ensuring consistent transcription quality.
Why This Matters to You
This new feature opens up a world of possibilities for detailed speech analysis. Imagine you are a public speaking coach. Now, you can provide your clients with incredibly precise feedback, showing them exactly where their filler words occur. This level of detail was previously difficult to capture accurately.
Think of it as a microscope for your speech patterns. The research shows that while many prefer clean transcripts, specific use cases demand this verbatim accuracy. As Josh Fox, Head of Product Marketing, stated, “for some customers—especially those focused on improving the public speaking capabilities of particular end users or tasked with official record keeping—filler words are as important or more important than the words that surround them.” This highlights the diverse needs Deepgram is addressing.
How might capturing every “uh” change your approach to communication or content creation?
Here are some key areas where this feature makes a difference:
| Use Case | Benefit of Filler Word Transcription |
| Sales Enablement | Analyze sales calls for confidence and clarity |
| Public Speaking Coaching | Pinpoint and reduce vocal disfluencies |
| English Language Coaching | Identify speech patterns for language learners |
| Legal Transcription | Ensure precise, legally admissible records |
| Human Resources | Document sensitive conversations accurately |
This enhanced accuracy means your transcribed data is richer and more insightful. It provides a complete picture of spoken interactions.
The Surprising Finding
Here’s the twist: while filler words are often considered a nuisance, the documentation indicates they are “as important or more important than the words that surround them” for certain applications. This challenges the common assumption that these vocal disfluencies should always be eradicated. We often strive for perfectly clean speech, yet their presence can carry significant meaning.
For example, consider a legal proceeding. Every single sound, every pause, and every “um” could be crucial for context or intent. Omitting these could alter the meaning or impact of testimony. The study finds that these seemingly insignificant utterances are an inescapable part of human communication. Their consistent spelling throughout the transcript, as the company reports, ensures reliability. This precise inclusion allows for deeper analysis, moving beyond simply understanding words to understanding the nuances of how those words are delivered.
What Happens Next
Deepgram’s new Filler Words feature is currently available with the Nova general model. The company reports that support for other models will be added shortly. This suggests a broader rollout in the coming months, potentially by late 2024 or early 2025. This expansion will make verbatim transcription accessible across more of their product offerings.
For example, imagine a podcaster who wants to analyze their speaking style over time. They could use this feature to track their use of filler words and work towards clearer delivery. Actionable advice for you: if precise transcription is vital for your work, explore Deepgram’s API Documentation. You can immediately try out their models in the API Playground. The industry implication is a move towards more granular and accurate speech-to-text services. This will likely set a new standard for transcription detail, especially in professional and analytical contexts.
