In minimum variance distortionless response, the filter coefficients used to remove noise and interfering signals are adjusted based on the statistics of the received audio samples. Consider a far field audio signal impinging an arbitrary array of N microphones such that the temporal short term frequency domain signal from an array of microphones can be represented as:

{\bf x}(t,\omega) = s(t,\omega) {\bf d}(\theta) + {\bf v}(t,\omega)

where boldface lower case letters represent vectors. Here, s(\omega) is the desired speech signal, {\bf d}(\theta) is the steering vector given by:

{\bf d}(\theta) =\left[\alpha_1, \alpha_2 e^{-j\omega d_2 \sin{\theta_2}/c}, \cdots, \alpha_N e^{-j\omega d_N \sin{\theta_N}/c}\right]^T

where \alpha_i‘s are the attenuation factors, d_i‘s are the distances from microphone one, and \theta_i‘s are the incident angles with respect the broadside of the line joining microphone i and the reference microphone one. c is the speed of propagation of sound in the the medium of interest. In a linear array topology, the distances will be integer multiples of a base distance d_0 whilst the incident angles will all be identical. Further, {\bf v}(t,\omega) is a vector containing i.i.d noise. Now consider a filtered SIMO system where the desired signal is synthesized with and estimated filter {\bf W}(\omega), with the time index dropped for brevity,such that:

y(\omega) ={\bf W}^\dagger (\omega) {\bf x}(\omega)

where \dagger is the hermitian operator. A cost function manifold is then defined as:

J(w) \doteq \frac{1}{2} \mathbb{E} \left[ |y(\omega)|^2\right] = \frac{1}{2}{\bf W}^\dagger (\omega) {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)} {\bf W} (\omega)

A minimum variance algorithm will find the minimum over this manifold. However, a trivial solution of {\bf W} (\omega) \doteq 0 will satisfy such a constraint. Thus a linear constraint of of {\bf C}^\dagger {\bf W} (\omega) = {\bf c} is used to preclude the trivial undesired solution leading to a Lagrangian cost function

L(\omega,{\bf \lambda}) = \frac{1}{2}{\bf W}^\dagger (\omega) {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)} {\bf W} (\omega) + {\bf \lambda}^\dagger ({\bf C}^\dagger {\bf W} (\omega) - {\bf c})


\frac{\partial L(\omega,{\bf \lambda}) }{\partial \omega} = 0 \Rightarrow {\bf W}^\dagger (\omega) {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)} + {\bf \lambda}^\dagger {\bf C} = 0

\frac{\partial L(\omega,{\bf \lambda}) }{\partial {\bf \lambda}} = 0 \Rightarrow {\bf C}^\dagger {\bf W} (\omega) - {\bf c} = 0

Under the assumption that the power spectral density is non-singular, we have:

{\bf W}^\dagger (\omega) = -{\bf \lambda}^\dagger {\bf C} {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-1}

\Rightarrow {\bf c}^\dagger= -{\bf \lambda}^\dagger {\bf C} {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-1} {\bf C}

\Rightarrow {\bf c}= - {\bf C}^\dagger {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-\dagger} {\bf C}^\dagger {\bf \lambda}

\Rightarrow {\bf \lambda} = - \left({\bf C}^\dagger {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-\dagger} {\bf C}^\dagger\right)^{-1} {\bf c}

Therefore we have:

{\bf W}(\omega) = {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-\dagger} {\bf C}^\dagger \left({\bf C}^\dagger {\bf \Phi}_{{\bf x}(\omega),{\bf x}(\omega)}^{-\dagger} {\bf C}^\dagger\right)^{-1} {\bf c}

VOCAL Technologies offers custom designed solutions for robust beamforming, voice activity detector, acoustic echo cancellation and noise suppression and active noise cancellation. Our custom implementations of such systems are meant to deliver optimum performance for your specific task. Contact us today to discuss your solution!

More Information