Complete Communications Engineering

VOCAL offers a Forensic Speaker Recognition Preprocessor solution that significantly improves speaker recognition rates under a variety of SNRs and different types of noise sources.

Figure 1. Text-Independent Speaker Identification Rates Versus Signal to Noise Ratio (dB).

The concept of using a person’s voice as a biometric feature to verify and/or to identify the individual has existed for many decades. Although current state of the art speaker recognition system is not as reliable as other biometric features such as fingerprints and DNA, automatic speaker recognition (ASR) can be a cost effective solution. ASR can be used in facility access control, transaction authentication for the financial and commerce industries. A growing and important application is in law enforcement. It can be used as a tool by detectives for tracking criminals and potentially in a court of law for conviction.

Under controlled environments ASR performs very well, close to 100% accuracy, but in practical situations, especially in law enforcement, the recordings of a speaker are usually under noisy conditions.  This results in a mismatch between the training model of speaker and the model under test which  significantly reduces recognition rates.  The additional signal components not related to the speaker make the features extracted more random and the unique characteristics of the speaker are muddled.

Figure 2. Text
Independent Speaker Identification Rates Compared With Different Noise Types.

Speech enhancement or a noise preprocessor can be applied to the recorded signals prior to the feature extraction stage of an ASR system to help improve recognition rates.  By attempting to remove signal components related to noise while preserving speech will gain back the uniqueness of the speaker.

The recognition task (Figure 1 and Figure 2) is to identify the speaker from a class of speakers using text that was not used during the modeling of the speaker.  This is considered the most difficult speaker recognition task as the speaker must be identified from a group of different speakers using words with different speech content than the model was trained with. The data used was from the NOIZEUS corpus.

The results show that the lower the SNR, the lower the recognition rates.  The babble (multi-talker) noise sources have the lowest recognition rates as the noise source has speech characters.  As it can be observed VOCAL’s preprocessor increases recognition rates by over 10% across all SNRs. In addition to improving rates under all types of noises, the preprocessor was able to increase recognition rates by 25% for babble noise.

VOCAL’s speaker recognition software may be  licensed standalone, as a library or part of a complete design. Our software libraries are optimized for leading DSPs and microprocessors from TI, ADI, Intel, AMD, ARM and other vendors.  Custom solutions are also available for your voice recognition application needs.

More Information

supported platforms