Beyond-Voice: Hand Tracking on Smart Home Devices

New acoustic sensing system enables 3D hand pose tracking using existing microphones and speakers.

Researchers have developed 'Beyond-Voice,' a novel system that turns commercial home assistants into active sonar devices. This technology allows for continuous 3D hand pose tracking using only onboard microphones and speakers, addressing privacy and usability concerns associated with cameras.

Katie Rowan

By Katie Rowan

September 18, 2025

3 min read

Beyond-Voice: Hand Tracking on Smart Home Devices

Key Facts

  • Beyond-Voice is a novel acoustic sensing system for 3D hand pose tracking.
  • It uses existing microphones and speakers on commercial home assistant devices.
  • The system transforms home assistants into active sonar systems.
  • It predicts 3D positions of 21 finger joints with high granularity.
  • Beyond-Voice achieved an average mean absolute error of 16.47mm in user studies.

Why You Care

Ever wish your smart home assistant could understand your gestures, not just your voice? Imagine controlling your devices with a wave of your hand, without needing a camera watching your every move. A new creation called Beyond-Voice is making this a reality. It promises to transform how you interact with your smart home, enhancing accessibility and privacy. This could fundamentally change your daily experience with system.

What Actually Happened

Researchers have unveiled a new acoustic sensing system named Beyond-Voice. This system allows commercial home assistant devices to continuously track and reconstruct 3D hand poses, according to the announcement. It transforms existing home assistants into active sonar systems. This is achieved by utilizing their onboard microphones and speakers. The system then feeds a high-resolution range profile to a deep learning model. This model can analyze motions and predict the 3D positions of 21 finger joints. This significantly advances the granularity for acoustic hand tracking, the paper states. The system operates effectively across diverse environments and users. What’s more, it does not require personalized training data.

Why This Matters to You

This creation means your smart home devices could soon offer more intuitive control. You might no longer need to rely solely on voice commands, which can be limiting. Think about situations where voice commands are inconvenient or impossible. Imagine silently adjusting your smart thermostat with a simple hand gesture. This system enhances accessibility for users with speech impairments. It also offers a more private alternative to camera-based systems. Do you struggle with voice commands in noisy environments? This system could be a important creation for your home. As mentioned in the release, “Beyond-Voice can track joints with an average mean absolute error of 16.47mm without any training data provided by the testing subject.”

Here’s how Beyond-Voice could impact your interaction with smart devices:

  • Enhanced Privacy: No cameras needed for gesture control.
  • Improved Accessibility: Offers an alternative to voice commands.
  • Intuitive Control: Use natural hand movements for device interaction.
  • ** Integration:** Works with existing home assistant hardware.

The Surprising Finding

What’s truly remarkable about Beyond-Voice is its precision without needing new hardware. The system achieves high-fidelity 3D hand pose tracking using only the microphones and speakers already present in commercial home assistant devices. This challenges the common assumption that gesture recognition requires dedicated cameras or specialized sensors. The study finds that Beyond-Voice can track joints with an average mean absolute error of 16.47mm. This level of accuracy is achieved without any personalized training data for the testing subject. This means it works right out of the box for almost anyone. It’s surprising because it leverages existing, often underutilized, components to perform a complex task. This capability was previously thought to require more , and often more intrusive, system.

What Happens Next

Looking ahead, we can expect to see further creation and potential integration of Beyond-Voice system. The research, accepted by IPSN 2024, suggests a strong future for this approach. Developers could begin incorporating this system into new smart home assistant models within the next 12-18 months. For example, imagine a future smart speaker that lets you mute music with a simple ‘shush’ gesture. You could also adjust volume by mimicking a turning knob. The industry implications are significant, potentially leading to a new standard for human-computer interaction. This could reduce reliance on visual data for control. For you, this means a future where your smart home is more responsive and less intrusive. The team revealed that this system operates across different environments and users. This broad applicability makes it a strong candidate for widespread adoption.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice