The pitch of speech signals are used in various speech algorithms such as speaker diarization, speaker recognition, secure voice activation, music stream recognition and person verification systems. Different algorithms are used with varying error rates and accompanying computational burdens. The most used pitch detection algorithm for real time operating systems is the average magnitude difference function (AMDF) due to is low computational complexity.

Suppose the received signal at the microphone is given as:

y[n]= s[n] + \nu[n]

where s[n] is the desired speech signal and \nu[n] is i.i.d zero mean Gaussian noise. The AMDF algorithm proceeds on a frame by frame basis. Suppose a frame of length N is available,such that 0 \le n \le N-1. Then the AMDF function is defined as:

AMDF[k] = \frac{1}{N-k-1} \sum\limits_{n=0}^{N-k-1} \left|y[n] - y[n+k]\right|, ~~ 0 \le k \le N-1

The pitch is found using:

P[n] = \underset{k}{argmin} ~~AMDF[k]

A smoothening function can be applied to P[n] to remove spurious noise. A sample performance of AMDF is shown in Figure 1 below:

Pitch detection in speech using AMDF


Figure 1: Pitch detection in speech using AMDF

It is evident that higher lag values utilize fewer samples which leads to inadequacies at lags greater than half the frame size. With very noisy samples, AMDF can exhibit half pitch of double pitch errors by picking the maximum lag always.

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

More Information