Voice quality in Public Switch Telephone Networks (PSTN) represents the standard for voice communications, and is referred to as toll quality. The evolution of packet-based transport is replacing this legacy communication systems. The implementation of Voice over IP (VoIP) networks presents a new set of challenges in optimizing the voice quality and providing the expected toll quality. For example, due to the limited bandwidth voice compression is required. This compression is often a lossy compression, hence some of the fidelity is potentially lost. In addition, packet loss and increased roundtrip delay and variability in delay all contribute to a degradation in voice quality. Despite these limitations VoIP has the potential to surpass toll grade with the applications of the wideband and super-wideband audio, and signal processing techniques such as, echo cancellation, speech enhancement and acoustic beamforming.
Echoes play an important role in the perception of voice quality. For example, in legacy devices, most users unknowingly expect to hear an echo (as long as the delay is less 25ms and the echo return loss is at least 45dB). IP phones actually create an echo called a sidetone to recreate this experience, otherwise users will think the line is dead. In communications silence can actually take away from the experience of a call, thus comfort noise generation, is required in nearly all applications of acoustic echo cancellation (AEC).
Echo cancellation in IP networks suffer from similar challenges to that in cellular networks. Due to the limited processing power of most WiFi Handsets, echo cancellation has to be performed at a centralized network location. Additional variability in the echo path can be handled via modifying codec parameters or via a post-filter.
In PSTN-to-VoIP gateways an additional echo span is required due to the additional delay in the gateway to the user. This additional delay is due to several stages of storing of data fall into the transmission channel (e.g., vocoder, packetization and queuing delays). Therefore, the echo tail length does not increase, but the increased span size has to be compensated for in the bulk delay of the echo canceller.
In multi-party VoIP conferencing system echo cancellation for a large number of channels with fully adaptive sample-by-sample may not be achievable. Allocating echo cancellation resources appropriately can help alleviate this burden. For example, once a channel has achieved some steady-state convergence, it can shift from adapting the filter on a sample-by-sample basis to a frame-by-frame basis. Or for example, the number of coefficients updates are divvied up among the channels depending on the convergence state.