Many features have been used successfully for VAD detection in practice. VOCAL technologies Acoustic Library provides robust and reliable VAD module that includes the following feature set.
Energy based
The decision mechanism is implemented by the following logic,
where and are the short- and long-term energy values. They can be calculated frame by frame in time domain.
where N is the frame length.
The long term can be obtained from a frame by frame moving average with an adjustable decay parameter as below,
where is from previous frame and is the decay parameter.
Correlation based
The decision logic,
where is the normalized correlation function at delay and is a threshold. The correlation can be calculated frame by frame in time domain.
where N is the frame length.
Spectrum Flatness
The decision logic,
where SF is the spectrum flatness measure. It can be calculated the frequency domain as following,
where are the geometric and algebraic means of the signal spectrum.
Cepstral Based
The idea is that energy based VAD will fail for high noise application. However cepstral vector is less prone to error under high noise. The decision logic is similar,
where are the short-term and long-term differential cepstral vectors. THey can be computed from the frequency domain.
Besides the above mentioned, VOCAL Technologies also incorporate other simpler measures in the VAD module, such as, spectral peak information, energy ratio etc., in the library.