With the rapid growth of the Internet, real-time Voice-over-IP (VoIP) has become an attractive alternative to conventional public telephony. A decoded VoIP stream may have a degraded quality when packets are lost or delayed because the lost packets cannot be retransmitted in real time. Although channel coding can be used to protect the audio from packet loss, it usually introduces extra redundancy/payload. In order to achieve higher quality in real-time voice transmission with low delay on the Internet, effective error concealment mechanisms, which typically extract features from the audio and use them to recover the lost data, must be developed.
Error concealment techniques can be divided into two types, according to where the concealment is carried out: receiver-based and sender-receiver-based. Receiver-based schemes perform loss concealment actions only at the receiver end. In receiver-based reconstruction schemes, lost packets can be simply recreated by padding silence or white noise, or by repeating the last received packet. To further enhance the quality and reduce the perceptibility of the error, lost packets are substituted by previously received packets after some form of pattern matching or by pitch period replication. The pitch period is estimated using speech segments immediately before lost packets or by performing waveform substitution based on previously received frames on each sub-band of linear-prediction (LP) residues.
These strategies only work well when losses are infrequent and when frame sizes are small. Stochastic packet reconstruction is a typical receiver-based speech error concealment scheme. It is based on the reconstruction of a missing audio data segment using a stochastic equivalent of the previously successfully received data packet in the frequency domain. Moreover, a number of packet concealment techniques employ time-based windowing functions for improving the audibility of the packet inserted, while other additionally use interleaving techniques for enhancing the concealment performance under the presence of burst packet losses. Such interleaving techniques are also usually combined with interpolation-based audio data repair methods, which attempt to interpolate from packets surrounding a loss to produce a replacement for that lost packet.
Sender-receiver-based schemes are usually more effective but more complex. A common way is for the sender to first process input streams, extract the features of speech, and transmit them to the receiver along with the audio. Hence the receiver can better estimate lost packets than those in receiver-based schemes.