Due to the limited processing power of most mobile devices, the advance signal processing required for voice quality enhancements (VQE), such as echo cancellation, have to be performed at a centralized network location. Performing echo cancellation further back in the transmission channel now puts low bit-rate coded speech (e.g. GSM-AMR) in the echo path. Typically in this scenario, the coded speech has to be decoded first, and then echo cancellation and other voice quality enhancement techniques are applied. Finally, the enhanced speech is re-encoded to be spent to the far-end user. Another approach is to perform VQE directly on the coded speech parameters, as seen in the figure below. The potential benefits of this method are the reduced complexity, delays, and quantization noise resulting from having to do the additional decode and encode to work on uncoded data.
The main difficulty of performing echo cancellation in a centralized network and on codec parameters is the estimation of the bulk delay of the echo path. Due to possible transmission errors and packet losses, the bulk delay is more variable than in applications in which acoustic echo cancellation is performed with the loudspeaker microphone enclosure. The delay that maximizes the cross-correlation of the codec parameters from the far-end and near-end signals represents an estimate of the bulk delay of the echo path. Since the gains of the fixed and adaptive codebooks directly affect the signal energy in the decoder, these gains can be lowered by the probability of the echo being present. Conveniently, the cross-correlation used for the delay estimate can also be used as a probability measure. The higher the max correlation, the increased likelihood codec parameters represent echo. This direct modification of the codec parameters provides low complexity solution to voice quality enhancement in cellular networks.