VOCAL’s GSM Adaptive Multi-Rate (AMR) codec is available as a real-time implementation that can be configured to support multichannel applications. GSM AMR software may be licensed as a standalone algorithm, as part of an embedded library suite, or with a VoIP stack for integration with user applications. Contact us to discuss your voice compression application requirements.
VOCAL’s GSM AMR source code is optimized to run on leading DSPs and conventional processors from TI, ADI, AMD, Intel and other vendors. Our GSM AMR vocoder software may be customized to meet your specific requirements.
GSM AMR
Originally developed by ETSI [GSM 06.90], the Adaptive Multi-Rate (AMR) speech codec is also specified for 3GPP [TS 26.071] and ITU [GSM-AMR-WB].
GSM AMR codec describes the detailed mapping from input blocks of 160 speech samples in 13-bit uniform PCM form to encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits and from encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits to output blocks of 160 reconstructed speech samples. The sampling rate is 8,000 samples/s leading to a bit rate for the encoded bit stream of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s.
The coding scheme for the multi-rate coding modes uses the Algebraic Code Excited Linear Prediction Coder (ACELP); the multi-rate ACELP coder is referred to as MR-ACELP. The transcoding procedure specified in GSM 06.90 is applicable for the adaptive multi-rate full rate and half rate speech traffic channels (TCH) in the GSM system.
GSM AMR Encoder
GSM AMR encoder takes its input as a 13-bit uniform PCM signal either from the audio part of the Mobile Station or on the network side, from the PSTN via an 8-bit A-law or μ-law to 13-bit uniform PCM conversion. The encoded speech at the output of the encoder is delivered to a channel encoder unit. In the receive direction, the inverse operations take place.
- The wideband AMR codec uses eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s. The codec is based on the code-excited linear predictive (CELP) coding model. A 10th order linear prediction (LP), or short-term, synthesis filter is used. The pitch synthesis filter is implemented using the so-calledadaptive codebook approach.
- In the CELP speech synthesis model the excitation signal at the input of the short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search procedure in which the error between the original and synthesized speech is minimized according to a perceptually weighted distortion measure. Theweighting filter uses the unquantized LP parameters.
- The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8,000 sample/s. At every 160 samples, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks’ indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are then decoded and the speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
- LP analysis is performed twice per frame for the 12.2 kbit/s mode and once for the other modes. For the 12.2 kbit/s mode, the two sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantized using split matrix quantization (SMQ) with 38 bits. For the other modes, the single set of LP parameters is converted to line spectrum pairs (LSP) and vector quantized using split vector quantization (SVQ).
- The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook parameters are transmitted every subframe. The quantized and unquantized LP parameters or their interpolated versions are used depending on the subframe. An open-loop pitch lag is estimated in every other subframe (except for the 5.15 and 4.75 kbit/s modes for which it is done once per frame) based on the perceptually weighted voice signal.
- Then the following operations are repeated for each subframe:
- The target signal is computed by filtering the LP residual through the weighted synthesis filter with the initial states of the filters having been updated by filtering the error between LP residual and excitation (this is equivalent to the common approach of subtracting the zero input response of the weighted synthesis filter from the weighted voice signal).
- The impulse response of the weighted synthesis filter is computed.
- Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target and impulse response, by searching around the open-loop pitch lag. Fractional pitch with 1/6th or 1/3rd of a sample resolution (depending on the mode) is used.
- The target signal is updated by removing the adaptive codebook contribution (filtered adaptive codevector), and this new target, is used in the fixed algebraic codebook search (to find the optimum innovation).
- The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively or vector quantified with 6-7 bits (with moving average (MA) prediction applied to the fixed codebook gain).
- Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in the next subframe.
- In each 20 ms speech frame, 95, 103, 118, 134, 148, 159, 204 or 244 bits are produced, corresponding to a bit-rate of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s.
GSM AMR Decoder
- At the decoder, based on the chosen mode, the transmitted indices are extracted from the received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These parameters are the LSP vectors, the fractional pitch lags, the innovative codevectors, and the pitch and innovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 40-sample subframe:
- The excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains
- The speech is reconstructed by filtering the excitation through the LP synthesis filter.
- Finally, the reconstructed voice signal is passed through an adaptive postfilter.
Configurations
- DAA interface using linear codec at 8.0 kHz sample rate
- Direct interface to 8.0 kHz PCM data stream (A-law or μ-law)
- North American/International Telephony (including caller ID) support available
- Simultaneous DTMF detector operation available – (less than 150 hits on Bellcore test tape typical)
- MF tone detectors, general purpose programmable tone detectors/generators available
- Data/Facsimile/Voice Distinction available
- Common compressed speech frame stream interface to support systems with multiple speech coders
- Dynamic speech coders selection if multiple voice codecs are available
- Can be integrated with Acoustic Echo Canceller, G.168 Line Echo Canceller and Tone Detection/Regeneration modules
- Available with VoIP stack
Features
- Full and half duplex modes of operation
- Passes ETSI test vectors
- Compliant with GSM 06.90 Recommendation
- Optimized for high performance on leading DSP architectures
- Multichannel implementation
- Multi-tasking environment compatible
More Information
- Audio Examples
- MIPS/memory requirements
- PSQM/PSQM+ values
- ETSI Recommendation GSM 06.90
- RFC 3267 – RTP Packetization
- RTP Parameters
Platforms
VOCAL’s optimized vocoder software is available for the following platforms. Please contact us for specific vocoder supported platforms and performance information.
Processors | Operating Systems |
---|---|
|
|