
Covert Speech Recovery via Vibrations:
A Review of Eavesdropping Techniques and Real-World Viability
1. Introduction
Human live speech is produced by the vocal system generating vibrations that travel through the vocal tract, resulting in the sounds we recognize as spoken language. In contrast, machine-generated speech is audio produced by mechanical devices such as speakers. Despite their differences, both live human speech and machine-rendered audio share a common feature: they create vibrations. For example, when a smartphone plays an audio file through its speaker, it generates vibrations similar to those produced by a human voice.
These vibrations, however, can be exploited by attackers to gain access to sensitive information. The key to such attacks lies not in the speech signal itself, but in controlling and capturing the vibrations associated with it. Attackers can record these vibrations using a variety of sensors—ranging from MEMS motion sensors and LiDAR systems to electro-optical sensors, high-speed cameras, position error signals, and piezoelectric disks commonly found in smartphones and tablets. This white paper explores the methods by which these vibration-based attacks can occur and assesses how serious a threat they pose in real-world scenarios, building on conceptual insights introduced by Walker (2021).
2. Side-Channel Attacks Using Vibrations
Some might distinguish between the audio domain and the vibration domain, but the two are deeply interconnected. Audio signals are essentially vibrations transmitted through air or solid materials. Attackers can passively extract information by recording these vibrations in the environment surrounding the source, whether the source is producing audio or other mechanical signals.
For instance, non-audio side-channel attacks have demonstrated that it’s possible to infer smartphone passwords by analyzing vibrations detected by the device’s own MEMS sensors as users type on the soft keyboard. When it comes to speech, various everyday objects can unintentionally leak information. The vibrations of a glass resting on a desk might carry subtle clues about nearby speech, while the structure of a speaker itself vibrates in a way that can reveal the audio it’s playing. After all, microphones work by converting these vibrations into electrical signals that represent sound. Figure 1 shows an example of a similar scenario.

While the practical success of these attacks depends on many factors, research has shown that under controlled conditions, vibration leakage can indeed be exploited to recover speech information. This white paper will detail how such attacks succeed and under what constraints, but it will also explain why these conditions rarely align perfectly in real-world environments, meaning the risk is currently limited.
3. Eavesdropping via Different Sensors
Several innovative methods have been proposed and tested for covertly eavesdropping on conversations by capturing vibration data:
- Multi-sensor Fusion: Combining data from geophones, accelerometers, and gyroscopes to create a network that reconstructs speech from vibrations.
- MEMS Gyroscope Exploitation: Using a smartphone’s MEMS gyroscope to detect vibrations caused by nearby acoustic signals.
- Accelerometer-Based Reverberation Capture: Measuring speech reverberations from a loudspeaker using the motion sensors embedded in smartphones.
- LiDAR Vibration Sensing: Detecting vibrations on surfaces like trash cans or windows using LiDAR sensors to infer the audio content played nearby.
- Electro-Optical Sensors: Employing electro-optical devices (such as laser Doppler vibrometers) through telescopes to measure minute vibrations on objects like light bulbs and reconstruct speech. Figure 2 shows this process in a pictures way

- High-Speed Cameras: Using high-speed video combined with image processing to analyze vibrations on surfaces affected by sound waves.
4. Challenges Facing Vibration-Based Eavesdropping
Despite the intriguing possibilities, several significant challenges limit the effectiveness of these eavesdropping techniques:
- Live Speech Variability: Unlike machine-generated audio, live human speech is highly variable and lacks consistent acoustic properties, making it harder to decode from vibrations alone.
- Sound Pressure Level (SPL): Most methods require speech to be relatively loud—above 70 dB—while typical conversational speech ranges between 40 and 60 dB.
- Propagation Medium: Sound primarily travels through air, but vibrations also propagate through solid materials, which affects how signals are transmitted and recorded.
- Sampling Rates and Sensor Sensitivity: Different sensors have varying sampling frequencies and sensitivities. Some may not meet the Nyquist rate needed to accurately capture speech vibrations, and the low sampling rate will result in poor signal-to-noise ratios.
- Background Noise: Ambient noise can easily overwhelm the subtle vibrations attackers seek to measure, reducing the quality and reliability of the data.
- Distance Between Source and Sensor: The effectiveness of vibration capture diminishes with distance, and the relative positioning between the sound source, vibrating object, and sensor must be favorable.
- Speech Recognition Complexity: Even if vibrations are captured, accurately recognizing and reconstructing speech from noisy or incomplete data remains a formidable challenge.
5. Conclusion
While the concept of eavesdropping through vibration-based side channels is both fascinating and technically feasible under laboratory conditions, the practical risk of such attacks in everyday environments remains low. The numerous constraints—ranging from inconsistent speech patterns and ambient noise to sensor limitations and physical distances—make successful exploitation highly unlikely outside controlled settings.
That said, as sensor technology advances and devices become more interconnected, it’s important to stay vigilant. Continued research and awareness can help anticipate emerging threats and develop countermeasures to protect sensitive information. For now, these vibration-based attacks represent a niche area of concern rather than an immediate, widespread security risk.