Amplitude modulation spectrum detection for speech

Speech signals have a periodic envelope which can be used for speech separation. The choice of amplitude modulation spectrum (AMS) for speech separation is motivated by biological observations that variations in modulation frequency provides a long term window for characterizing speech information, especially in noisy and reverberant environments.
Suppose the received signal at the microphone is given as:

$y[n]= s[n] + \nu[n]$

where $s[n]$ is the desired speech signal and $\nu[n]$ is i.i.d zero mean Gaussian noise with variance $\sigma_{\nu^2}$ . The short term spectrum of each frame is computed such that:

$y(\omega)= s(\omega)+ \nu(\omega)$

The negative frequency components of $y(\omega)$ are set to zero to synthesize a signal $\hat{y}(\omega)$ where:

$\hat{y}(\omega)= \begin{cases}y(\omega) & \omega < \frac{F_s}{2}\\0 &\text{otherwise}\end{cases}$

The inverse short term spectrum of $\hat{y}(\omega)$ is the taken, denoted $\hat{y}[n]$ . The AMS is then twice the amplitude of $\hat{y}[n]$ . A sample performance of this algorithm is shown in Figure 1 below:

Figure 1: Speech amplitude modulation detection

The AMS signal can also be used to detect the pitch of speech. However, this approach is very sensitive to noise.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

More Information