The use of automated speech recognition (ASR) engines for mobile applications is on the rise. Though, in theory, increasing the number of microphones affords increase in signal to noise ratio (SNR), practical constraints such as computation complexity and latency limits the realistic number of microphones that can be used. Algorithms that tend to rely on single pre-computed filters are a gold standard. We present the use of four microphones, which is the minimum number of microphones required for 3-dimension disambiguation of spatial direction of audio sources. Without loss of generality, we considered the relaxed dimension of 2-dimension audio beamforming with four microphones in a square topology.
Consider an acoustic signal impinging a microphone array as shown in Figure 1 below:
Figure 1: Four microphones in square topology
Using the geometric arrangement of the microphones, a filter can easily be generated that corresponds to the aggregate of the signals impinging the array. It is also easy to see that for every acoustic source, it impinges at an angle which obeys . Using this information, a single distortion less response for can be generated for all angles withing the said range as shown on Figure 2 below:
Figure 2: Frequency response for single filter for all }
The single filter can give us gains up to points using only 4 microphones. A sample of the results on real data is shown below on Figure 3, with a SNR improvement of .
Figure 3: Four microphones showing a improvement
The SNR improvement can be enhanced using single channel noise suppression and AGC on the beamforming output.
VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!