Speech pitch detection is a feature with many uses such as speaker diarization and speech activity detector. The use of average magnitude difference function (AMDF) is prevalent in RTOS’s due to its low complexity. A natural alternative to the AMDF is the magnitude average product function (MAPF).
Suppose the received signal at the microphone is given as:

y[n]= s[n] + \nu[n]

where s[n] is the desired speech signal and \nu[n] is i.i.d zero mean Gaussian noise. The MAPF algorithm proceeds on a frame by frame basis. Suppose a frame of length N is available,such that 0 \le n \le N-1. Then the AMDF function is defined as:

MAPF[k] = \left| \sum\limits_{n=0}^{N-1} y[n] \times y[{\text mod} {(n+k,N)}]\right|, ~~ 0 \le k \le N-1

A smoother version, which we utilize, is the binarized bMAPF which is defined as:

bMAPF[k] = \left| \sum\limits_{n=0}^{N-1} (y[n]>0) \times (y[{\text mod} {(n+k,N)}]>0)\right|, ~~ 0 \le k \le N-1

The pitch is found using:

P[n] = \underset{k}{argmax} ~~bMAPF[k]

A smoothening function can be applied to P[n] to remove spurious noise. A sample performance of AMDF is shown in Figure 1 below:

Pitch detection in speech using bMAPF

Figure 1: Pitch detection in speech using bMAPF

VOCAL Technologies offers custom designed solutions for beamforming with a robust voice activity detector, acoustic echo cancellation and noise suppression. Our custom implementations of such systems are meant to deliver optimum performance for your specific beamforming task. Contact us today to discuss your solution!

More Information