Voice quality enhancement is an important feature to any speech communication system. Speech enhancement and noise reduction techniques can greatly improve recognition rates of automatic speech recognition systems and improve the perceptual voice quality of a hands-free communication system. Noise Reduction of Non-stationary Noise Sources ,discussed several methods for estimating the noise spectrum for speech enhancement. These single-channel methods have been relatively successful at achieving noise reduction, but can often cause distortion to the desired speech signal and/or introduce musical noise from over-subtraction.
Microphone arrays and acoustic beamforming has garnered attention for their ability to use spatial filtering to separate speech from noise sources. Most beamforming techniques rely on either knowing or being able to estimate the direction of arrival (DOA) of the desired signal with low amount of uncertainty. Estimation of the DOA can become erroneous in reverberate environments or in environments with high levels of ambient noise. We introduce a dual-channel noise estimation technique that is not reliant on the direction of arrival information.
This dual-channel noise estimation technique is essentially an extension of the single-channel noise estimation but uses the phase difference information to help determine the probability of the presence of speech. In the evolution of noise estimation, Rainer Martin made minimum statistics an efficient way of tracking the noise spectra even in the presence of speech, and Ephriam and Malah made use of the minimum mean square error estimator for determining the probability of speech present and an estimate of the speech spectrum components. In the dual-channel noise estimation method it is assumed that the phase information for the desired speech signal is more stationary than the noise components. Thus, if the current phase difference between the two channels is significantly different from the long term phase difference average, then the probability of the speech being present is low. In multichannel system in which the direction of arrival information is unreliable this technique can provide improved speech quality over acoustic beamforming and single-channel speech enhancement methods.