For all speech enhancement algorithms, a voice activity detector (VAD) is utilized, not only to limit robust processing only during actual speech frames, but also to dynamically detect the noise floor. In an adaptive VAD, the threshold for speech detection is constantly being updated.
Consider an energy based VAD, with the energy computed as an average of the instantaneous temporal energies. Suppose the received signal at the microphones are given as:
where is the desired speech signal, is the relative delay with respect to microphone 1. and is i.i.d zero mean Gaussian noise. Then, the threshold can be adaptively computed using the equation:
where is the number of samples per frame, is the number of microphones in theh array and is a design parameter. A sample performance of this algorithm is shown in Figure 1 below:
Figure 1: Adaptive VAD thresholding
The processed speech is illustrated in Figure 2 below:
Figure 2: Processed speech using adaptive VAD
VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!