Voice Activity Detection (VAD) software is an important component to many speech signal processing algorithms. These detectors can lower the bit rate of speech coders in VoIP applications by only transmitting when there is an active talker. VAD also plays an important role in the control of estimation routines used in echo cancellation and noise reduction algorithms. VAD is also useful in automatic gain control routines. VOCAL’s robust Voice Activity Detection software algorithms support engineering trade-offs between the latency and accuracy of detection in different applications of voice activity detection.
VOCAL’s VAD software is available for license in a variety of forms, including ANSI C and assembly language implementations optimized for leading DSP architectures from TI, ADI, AMD, ARM, MIPS, CEVA, LSI Logic ZSP and other vendors. These libraries are modular and can be executed as a single task under a variety of operating systems or standalone with its own microkernel. Please contact us to discuss your voice application requirements.
Voice Activity Detection Features
- Integrated with speech coders (e.g. ITU G.711, G.729)
- Easily integrated with Voice Quality Enhancement modules
- Low signal latency (less than 200ms)
- Adjustable sensitivity and thresholding
- Functions are C callable
Voice Activity Detection in Speech Coders
The most notable application of Voice Activity Detection is in speech coders. In a balanced conversation one person is talking only 50% of the time, and there is a large amount inactive frames. Therefore, during these periods of inactivity, a silence insertion description (SID) packet can be sent instead of a coded speech packet. Because a silence packet requires less bits than voice packets, the bandwidth savings can be around 40%. VAD can be embedded into the codec, such as G.729AB or it can be external to the codec, such as G.711. For either case, the requirement for VAD is the same.
In most VoIP applications, latency needs to be kept to a minimum, as an increase in the round trip delay lowers the perceptual quality of the conversation. In addition, the probability of missing voice activity detection should be zero. Any misses of activity would be detrimental to conversation. Fortunately, false alarms can be easily tolerated and the major negative result would be an increase in bandwidth. Thus, energy based VAD schemes with low thresholds of detection are sufficient for most speech coding applications.
Voice Activity Detection in Voice Quality Enhancement
In voice quality enhancement algorithms, such as echo cancellation and noise reduction, VAD technologies play more of an indicator role. For example, in earlier methods of noise reduction, VAD was used to determine when to estimate the noise spectrum. Periods of inactive voice would serve as a reference of the noise spectrum. As noise estimation techniques have advanced, VAD has a reduced role in estimation, but it is still important to algorithms that use a probability of speech presence metric.
VAD plays a similar role in echo cancellation. An important estimate in echo cancellation is the coupling factor. The coupling factor serves as an estimate of what percentage the far-end signal is received as echo in the near-end signal. The best time to estimate this factor is during periods of activity in the far-end signal. Thus, VAD is used as a control for when to update this estimate. In echo cancellation, the accuracy and latency requirements of VAD is different than that of speech coders. This is because a false alarm for detection can be more damaging to the coupling factor estimation than a miss.
VOCAL’s optimized vocoder software is available for the following platforms. Please contact us for specific MELPe codec supported platforms and performance information.