Complete Communications Engineering

Practices for measuring audio signals include measuring active speech levels in RMS dB’s (no weighting) and using pre-defined weighting (as per weighing curves A, B, C, and D). These practices have adequate support in various standard documents. Active speech levels are determined using the ITU-T P.56 methodology.

P56 active speech levels
Figure 1: Approximate examples of P.56 levels (ASL) for three waveforms: noisy speech , sine wave, and noise; all plots are normalized in value to 1 (as per *.wav format)

P.56 outlines three basic methods of measuring active speech level (ASL): Method A, Method B and Method B-equivalent (cf. Ref.1) .

Method A has been devised for immediate indication of the speech volume for real-time applications. These applications as listed in Table 1 of G.56 include:

Method B has been developed for active speech level of all other applications not listed under Method A. Therefore, for example Method B shall be used for establishing SNR improvement (SNRI) when estimating performance of the Noise Reduction feature. Method B is defined in an algorithmic fashion suitable for implementation in the DSP code. Its essence is to:

  1. compute signal energy for the audio segment under consideration,
  2. compute two-stage exponential averaging of the absolute sample values, over the audio segment (and in the process determining intermediate envelope and final envelope functions),
  3. through interactive process using two-stage absolute averages and signal energy,  determine a threshold THR above which the waveform represent the speech and otherwise it represents silence or background noise.

Method B-equivalent is only very generally defined and it can be summarized as any method broadly following principles of Method B and producing similar results to Method B. Similarity of results is broadly defined as having measurement results within margin +/-1dB with confidence level greater than 95%. Figure 1 illustrates three examples of waveforms with estimated ASL values in dBov (RMS, no weighting).

P56 weighting
Figure 2: Weighting curves A, B, C, and D. Most common weighting curves are A and C

When working with experimental audio files, particularly with audio files collected from real-life acoustic scenes, it is helpful to pre-process these files by removing frequencies that are outside of the audio channel band for which these files are intended. In addition to that, when measuring the audio levels of these files, typical practices require weighting curves, such as A, or B, of C, or D ( depending on the specific applications) be applied. Figure 2 depicts four typical weighing curves.

VOCAL Technologies engineering practices include standard compliant Active Speech Level measurements and estimations as well as standard compliant A and C weighting.

More Information