Deep Neural Networks Residual Echo Suppression

Nonlinear Processing (NLP) and Residual Echo Suppression is required in real world acoustic echo cancellation (AEC) software. Additive noise, time varying echo paths and non-linear characteristics of the loudspeaker microphone enclosure all prevent a linear adaptive filter from achieving perfect echo cancellation.

There are many formulations of non-linear echo suppression. One approach is based on Wiener filter theory. The optimal echo and noise suppression is:

$H(f)\ =\ \frac{SNR(f)}{SNR(f)\ +\ 1}$

Where is $SNR(f)\ =\ \frac{S_s(f)}{S_{res}(f)+S_n(f)}$ , is the ratio of the near-end signal power over the residual echo and noise power. Each of these components are unknown and need to be estimated over time. While this type of suppression rule is effective, there are still many artifacts in the result. The time lag between the instantaneous signal components and their estimates can result in distortion to the near-end signal or leftover residual echo or both.

The application of Deep Neural Networks (DNN) to residual echo suppression has been gaining attention recently. DNN are well-suited for this task because of their ability to model non-linear processes. DNN can be designed and trained to classify if a time-frequency component of the linear adaptive filter output is local speech, noise or echo, and attenuate accordingly. The echo excitation signal can be used as an additional input feature to DNN to help the classification process.