Spectrum Subtraction: Implementation

For a noisy signal,

$x\left(t\right)\ =\ s(t)\ +\ n(t)$

we would like to perform in frequency domain,

$\hat{S}\left(f\right)=X\left(f\right)-N\left(f\right)$

if N(f) is available or can be estimated.

We believe that the recovered signal will be enhanced in some optimization measure compared to the original noisy signal.

This procedure has been dominating the noise reduction research and practice since the 1980s. Its main drawbacks have been thoroughly investigated and remedied. The two most important improvements are 1) scaling factor, α, which controls the noise reduction aggressiveness, and 2) noise floor.

The following is usually implemented,

$\left|\hat{S}\left(f\right)\right|^2=\{\begin{matrix}\left|Y\left(f\right)\right|^2-\alpha\left|N\left(f\right)\right|^2,&\left|Y\left(f\right)\right|^2>\beta\left|N\left(f\right)\right|^2 \\beta\left|N\left(f\right)\right|^2,&\left|Y\left(f\right)\right|^2=<\beta\left|N\left(f\right)\right|^2 \end{matrix}$

where $\alpha\ >\ 0$ and $\beta\ll\ 1$ .

The two parameters are introduced to specifically cure the music tone artifacts.

If $\alpha\ >\ 1$ , we achieve a higher SNR of the recovered signal but distortion in the recovered speech may become more pronounced perceptually. Therefore, $\alpha$ must be fine-tuned to prevent music tone artifacts and speech distortion.

The noise floor is determined by a second parameter $\beta$ . It sets a low bound, $\beta\left|N\left(f\right)\right|^2$ . Therefore deep valleys will be filled up and noise peaks will be made less obvious.

The above introduces the basic noise reduction implementation. However, the basic implementation has limited usage since the parameters are chosen stationary and difficult to choose for all noisy situations. It is especially difficult for low signal-to-noise ration applications.