The use of energy comparison to detect double-talk (cross-talk) can detect both the presence of a near-end speaker and a non-convergent echo. Energy detection schemes however,cannot discriminate between the two. Under the assumption that the the desired near end speech is uncorrelated with the far-end speech, the inner product of the error signal and the near end signal can be used to detect double talk.
Consider the systems depicted in Figure 1 below:
Figure 1: Single line AEC architecture
is the far end speech whilst is the near-end speech. Denote a frame of far-end speech with accompanying echo path filter as:
Then the received near-end microphone signal is:
where is zero mean i.i.d. ambient noise. The error signal is given as:
where denotes the estimated variable. We are interested in the expectation of the cross product between the error signal and the microphone signal, thus:
It can be seen that with a convergent filter, is orthogonal to the error signal. Under the assumption that he near end and far end speeches are orthogonal, The the time-frequency domain representation becomes when there is near end speech, hence . The orthogonality based detection scheme is then given as:
where is a threshold parameter. A window is most times applied to remove spurious noise in the detection scheme.
VOCAL Technologies offers custom designed solutions for AEC with a robust double-talk detection, voice activity detector, beamforming and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific task. Contact us today to discuss your solution!