Complete Communications Engineering

G.723.1 vocoder
G.723.1 dual rate vocoder is used in low bit rate multimedia services

G.723.1 Codec

VOCAL’s G.723.1 codec is a dual rate vocoder used for compressing speech or other audio signal component of multimedia services at a very low bit rate. Contact us to discuss your voice codec application requirements.

download brochure request demo

VOCAL’s G.723.1 software is optimized for leading DSPs and RISC/CISC processors from TI, ADI, AMD, ARM, Intel and other vendors. G.723.1 voice compression software may be licensed as a standalone algorithm, as a library, and with a VoIP stack. Custom designs are also available to meet unique G.723.1 application requirements.


The G.723.1 algorithm specifies a coded representation to compress speech or audio for multimedia services, primarily very low bit rate visual telephony as part of the overall H.324 family of standards. G.723.1 has two bit rates associated with it, 5.3 kbit/s and 6.3 kbit/s. The higher bit rate has better voice quality. The lower bit rate still gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the G.723.1 encoder and decoder, it is possible to switch between the two rates at any 30 ms frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also available. G.723.1 Annex A defines 4 byte SID (Silence Insertion Description) frames.

A G.723.1 speech coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (Recommendation G.712) of the analogue input, then sampling at 8,000 Hz and then converting to 16-bit linear PCM for the input to the encoder. The output of the G.723.1 decoder should be converted back to analogue by similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64 kbit/s PCM data, should be converted to 16-bit linear PCM before encoding or from 16-bit linear PCM to the appropriate format after decoding.

G.723.1 provides audio encoding for multimedia services at a very low bit rate.

G.723.1 Encoder

The coder is based on the principles of linear prediction analysis-by-synthesis coding and attempts to minimize a perceptually weighted error signal. The encoder operates on blocks (frames) of 240 samples each. That is equal to 30 msec at an 8 kHz sampling rate. Each block is first high pass filtered to remove the DC component and then divided into four subframes of 60 samples each. For every subframe, a 10th order Linear Prediction Coder (LPC) filter is computed using the unprocessed input signal. The LPC filter for the last subframe is quantized using a Predictive Split Vector Quantizer (PSVQ). The unquantized LPC coefficients are used to construct the short-term perceptual weighting filter, which is used to filter the entire frame and to obtain the perceptually weighted speech signal.

For every two subframes (120 samples), the open loop pitch period, LOL, is computed using the weighted speech signal. This pitch estimation is performed on blocks of 120 samples. The pitch period is searched in the range from 18 to 142 samples. From this point the speech is processed on a 60 samples per subframe basis.

Using the estimated pitch period computed previously, a harmonic noise shaping filter is constructed. The combination of the LPC synthesis filter, the formant perceptual weighting filter, and the harmonic noise shaping filter is used to create an impulse response. The impulse response is then used for further computations.

Using the pitch period estimation, LOL, and the impulse response, a closed loop pitch predictor is computed. A fifth order pitch predictor is used. The pitch period is computed as a small differential value around the open loop pitch estimate. The contribution of the pitch predictor is then subtracted from the initial target vector. Both the pitch period and the differential value are transmitted to the decoder.

Finally, the non-periodic component of the excitation is approximated. For the high bit rate, Multi-Pulse Maximum Likelihood Quantization (MP-MLQ) excitation is used, and for the low bit rate, an Algebraic Code Excited Linear Prediction (ACELP) is used.

G.723.1 Decoder

The G.723.1 decoder operation is also performed on a frame-by-frame basis. First the quantized LPC indices are decoded, then the speech decoder constructs the LPC synthesis filter. For every subframe, both the adaptive codebook excitation and fixed codebook excitation are decoded and input to the synthesis filter. The adaptive postfilter consists of a formant and a forward-backward pitch postfilter. The excitation signal is input to the pitch postfilter, which in turn is input to the synthesis filter whose output is input to the formant postfilter. A gain scaling unit maintains the energy at the input level of the formant postfilter.

Delay Factor

G.723.1 coder encodes speech or other audio signals in 30 msec frames. In addition, there is a look ahead of 7.5 msec, resulting in a total algorithmic delay of 37.5 msec. All additional delays in the implementation and operation of this coder are due to:


More Information



VOCAL’s optimized vocoder software is available for the following platforms. Please contact us for specific vocoder supported platforms and performance data.

ProcessorsOperating Systems
  • Texas Instruments – C6xx (TMS320C62x, TMS320C64x, TMS320C645x, TMS320C66x, TMS320C67x), DaVinci, OMAP, C5xx (TMS320C54x, TMS320C55x)
  • Analog Devices – Blackfin, ADSP-21xx, TigerSHARC, SHARC
  • PowerPC, PowerQUICC
  • MIPS – MIPS32, MIPS64, MIPS4Kc
  • ARM – ARM7, ARM9, ARM9E, ARM10E, ARM11, StrongARM, ARM Cortex-A8/A9, Cortex-M3/M4
  • Intel / AMD – x86, x64 (both 32 and 64 bit modes)
  • Linux, uClinux, BSD, Unix
  • Microsoft Windows ACM / RTC / CE / Mobile
  • Apple iOS / iPhone / iPad & MacOS
  • eCOS / eCOSPro
  • Google Android
  • Green Hills Integrity
  • Micrium μCOS
  • Symbian
  • Wind River VxWorks