Implementing MUSIC Algorithm

For a general microphone array, each element has a response to a source that is a function of the source direction. For the direction $\theta_i$ , we denote the response as $a_m\left(\theta_i\right)$ from the microphone m. If K sources exist, we have the spatial matrix below,

$A\ =\left[\begin{matrix}a_1\left(\theta_1\right)&\cdots&a_1\left(\theta_K\right)\\\vdots&\ddots&\vdots\\a_M\left(\theta_1\right)&\cdots&a_M\left(\theta_K\right)\\\end{matrix}\right]$

and the microphone array output,

$\left[\begin{matrix}x_1\left(t\right)\\\vdots\\x_M\left(t\right)\\\end{matrix}\right]=\left[\begin{matrix}a_1\left(\theta_1\right)&\cdots&a_1\left(\theta_K\right)\\\vdots&\ddots&\vdots\\a_M\left(\theta_1\right)&\cdots&a_M\left(\theta_K\right)\\\end{matrix}\right]\left[\begin{matrix}s_1\left(t\right)\\\vdots\\s_K\left(t\right)\\\end{matrix}\right]+\left[\begin{matrix}n_1\left(t\right)\\\vdots\\n_M\left(t\right)\\\end{matrix}\right]$

$x\left(t\right)=A\left(\theta\right)s\left(t\right)+n\left(t\right)$

the snapshot of the array at time t.

The snapshot contains certain structures of the data model. Pack a block of snapshots of length L into a matrix,

$X(t)=\left[\left[\begin{matrix}x_1\left(t\right)\\\vdots\\x_M\left(t\right)\\\end{matrix}\right]\ldots\ \left[\begin{matrix}x_1\left(t+L\ -\ 1\right)\\\vdots\\x_M\left(t+L\ -\ 1\right)\\\end{matrix}\right]\right]$

and the data

The covariance matrix, $R_y=E\left[YY^H\right]=AR_sA^H+R_n$ , is Hermitian and positive definite since $R_n$ is always positive definite.

This short paper the basic idea and implementation are described. Its use for resolving multiple sound sources with antenna array is discussed.

Generalizing Figure 1 to N sources to M microphones, we have the microphone captured sound vector,

$y_m\left(t\right)=\sum_{n=1}^{N}{s_n\ \left(t\right)e^{-j2\pi f\left(m-1\right)dsin\left(\theta_n\right)/v}}+noise_m\left(t\right) =\sum_{n=1}^{N}$

${s_n\left(t\right)a_m\left(\theta_n\right)}+noise_m\left(t\right)$

We can further write this into the following matrix form,

$Y=AS+N$

where $Y = \left[y 1\left(t\right), y 2\left(t\right), \dots , y M\left(t\right)\right], S = \left[s1\left(t\right), s 2 \left(t\right), \dots , s N \left(t\right)\right]$ , and $A=\left[a_1\left(\theta_1\right),a_2\left(\theta_2\right),\cdots,a_M\left(\theta_N\right)\right]$ .

The covariance matrix, $R_y=E\left[YY^H\right]=AR_sA^H+R_n$ , is Hermitian and positive definite since $R_n$ is always positive definite.
By eigenvalue decomposition, vi and λi are the eigenvector and corresponding eigenvalue, we have

$R_yv_i=\lambda_iv_i$ .

The noise dimension has smaller eigenvalue, which is the noise floor, while the dimensions that contain signals will have larger eigenvalue. Therefore, the noise subspace can be constructed by

$E_N=\left[\mathbf{v}_{\mathbf{N}+\mathbf{1}},\mathbf{v}_{\mathbf{N}+\mathbf{2}},\cdots,\mathbf{v}_\mathbf{M}\right]$ .

The signal dimension will have smaller value if it is projected into the noise subspace. Therefore, the following formula will have larger value and appear to be a peak,

$\frac{1}{\left|E_n^Ha\left(\theta\right)\right|^2}\ =\ \frac{1}{a^H\left(\theta\right)E_nE_n^Ha\left(\theta\right)}$

The above formula defines the MUSIC algorithm. The number of peaks indicates the number of independent sound sources and the corresponding direction \theta defines the incoming direction of each sound source.

It is important to remember that the MUSIC algorithm implementation described there applies to only narrow band signal. The performance of the algorithm also depends on the relation between the inter-mic distance and the sampling frequency.