In a speech signal, we often have non-stationary statistics, and in an echo path, we often have transients. To ensure we capture all of this dynamic behavior for an echo cancellation implementation, we need to sample with infinite precision. Since this is impossible, the next best thing we can do is sample at an incredibly high rate. From a practical standpoint, this is undesirable because it will lead to incredibly long FIR filters. What we need is a way to keep our FIR filters short, while keeping our frequency resolution high. For acoustic echo cancellation (AEC), multi-resolution analysis can do just that.
Figure 1: Multi Resolution Analysis
Multi-Resolution Analysis
As Figure 1 shows, multi-resolution analysis recreates the original signal using sub-signals with increased resolution. To arrive at such a reconstruction [1], we first consider the continuous time speech signal . We first sample it at a rate αj forming the approximate signal where . This sampling operation can be viewed as the projection . Similarly, at a rate α j +1 > αj, we have due to the increase in resolution.
Since , we let to write . Wj is called the detail of the resolution αj. Essentially, it contains the information left out by Vj in the approximation. Via a recursion, we can write. Denoting the basis as the basis of the jth subspace of V with ith element where and similarly as the basis of Wj with ith element we can write any function where is the space of causal finite energy signals as:
Where cj and dj are respectively the scaling and detail coefficients.
The Haar Basis
Figure 2 shows the scaling (a) and detail (b) functions of the Haar basis. They are descibed as:
With this in mind, we can now begin to describe how to construct the scaling and detail functions at an arbitrary scale j+1 from the functions at scale j. For a real-time acoustic echo canceller, such a recursion is an efficient and robust simplification. To do so, we use the multi-resolution equations for the scaling and details respectively [2]:
Where h0 and h1 are the scaling and detail filter coefficients, which are related via:
Thus, we only need to create h0 which is simply a low pass filter as the filter h1 will then be a high pass filter comprising the remaining bandwidth. Continually subdividing the available bandwidth, the spectrum is soon covered by detail banks, thus we arrive at the wavelet decomposition filter bank structure shown below:
Figure 2: The Wavelet Filter Bank Decomposition [2]
The Haar Filter
The Haar transform is an isomorphism represented by a matrix H whose rows represent the successive high pass filters used in the wavelet decomposition above. Therefore, this matrix is orthogonal. Denote z[k] = Hx[n] as the Haar domain representation of the far end speech signal and v0 as the optimal Haar domain adaptive weight vector. The Wiener solution is then:
Where and . Then using the definition of the Haar transform, we can solve for the Wiener solution:
So the optimal Haar domain solution is simply the Haar Transform of the optimal time domain solution. Just like in the time domain, our acoustic echo cancellation system can create an adaptive approximation of the Wiener solution in the Haar domain. Assuming a tap length of L, we define the Haar domain approximation to the echo path as:
Where (∙|∙) denotes the inner product. With the error defined as ε[k] = (d – γ)[k] we track the average power in the Haar domain input as:
Where θ is a positive exponential smoothing factor. By placing the elements of this vector along the main diagonal of a diagonal matrix D[k] we can define the Haar domain Normalized Least Mean Square Update Equation as [3]:
The heirarchical structure of the wavelet decomposition can be used to simplfy the implementation of such an algorithm [4]. Recalling that , we can select one subspace whose coefficients we will update and update the rest vicariously through their relations with the chosen space. Denoting this space as the control space are called ‘parent’ spaces while are called ‘child’ spaces.
Exploiting these relationships translates to updating only the coefficients in the parent and child spaces that temporally overlap with the control coefficients that are significant. In this way, the Haar domain approximation becomes:
Where Δ, P, C denote the sets of control, parent, and child coefficients respectively. The update equation is similarly restricted to these subsets, for example, the parent coefficients are updated as:
Thus a real-time implementation of the Haar wavelet decomposition has been realized. For echo cancellation, the ability to simultaneously have good frequency and time resolution is paramount. Using the above method, acoustic echo cancellation can be greatly improved.
References
[1] S. Mallat, “Multiresolution approximation and wavelet orthonormal bases of L2,” Transactions of the American Mathematical Society, June 1989.
[2] P. Kechichian, On the Partial Haar Dual Adaptive Filter for Sparse Echo Cancellation. M.E. thesis, McGill University, Canada, 2006.
[3] H. Deng, M. Doroslovacki. “Wavelet-Based MPNLMS Adaptive Algorithm for Network Echo Cancellation” EURASIP Journal on Audio, Speech, and Music Processing, 2007
[4] K. C. Ho, S. D. Blunt. “Rapid Identi_cation of a Sparse Impulse Response Using an Adaptive Algorithm in the Haar Domain”, IEEE Transactions on Signal Processing, vol. 51, no. 3, March 2003, pp.628-638