Noise reduction of non-stationary noise sources in speech enhancement techniques is a requirement in telecommunications and automatic speaker recognition systems that may operate in noisy environments with dynamic noise sources. In conference settings that have users a far distance from the microphone, the perceptual quality of the communication and the effectiveness of the conference can be significantly lowered.
A common approach to speech enhancement using single channel noise control is spectral subtraction. In spectral subtraction, since the noise source is considered uncorrelated and independent of the signal of interest, the noise spectral estimate can be subtracted from the captured signal, thus providing an improved quality of communication.
There are several approaches to obtaining an estimate of the noise spectrum. Early approaches relied on voice activity detectors (VAD) and sampled the noise spectrum during periods of inactivity of the speaker. This approach proved to be adequate in stationary noise environments, but inadequate in non-stationary environments as the noise spectrum cannot be obtained during voice activity.
The application of minimum statistics by Rainer Martin emerged as a simple but effective means of estimating the noise spectrum during speech activity. The main concept behind the minimum statistics technique is that speech in the time-frequency space is sparse. This sparsity can be exploited by tracking the minimum value of the input frequency subband signals, yx(n), over a limited number of blocks, NMS,
Sb(Wx,n) = min{yx(n), yx(n – 1),…, yx(n – NMS)} |
While the minimum statistics technique is an improvement of earlier methods, it still takes the length of the window to update the noise spectrum. A window that is too long will only be able to track slowly varying noise sources, while a too short window can cause speech distortions. To allow for fast tracking without speech distortions, minimum tracking can be integrated with speech presence probability. Biasing the estimate noise spectrum based on the probability of speech present allows the update of the noise spectrum to be more aggressive than the straightforward minimum statistics approach.