For all speech enhancement algorithms, a voice activity detector (VAD) is utilized, not only to limit robust processing only during actual speech frames, but also to dynamically detect the noise floor. A VAD essentially is designed to distinguish noise from non noise frames. We can therefore use any number of the characteristics of noise which is not present in speech. One such characteristic is the number of zero crossings, which on average is less than the number observed in i.i.d. noise. Consider an zero crossing count based VAD, with the number of zero crossings computed computed by using a simple difference and comparator unit such that we count the number in a frame that satisfy . Notice that this is true for both positive to negative transitions and negative to positive transitions. Suppose the received signal at the microphones are given as:
where
is the desired speech signal and is i.i.d zero mean Gaussian noise. Then, the threshold can be adaptively computed using the equation:
where is the number of samples per frame and is a design parameter. A gradual decay and magnification is also used for the maximum and minimum levels to prevent being stuck at a spurious point. A sample performance of this algorithm is shown in Figure 1 below:
Figure 1: Adaptive VAD thresholding
VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!