In many current networking problems, in-network image or video scalability could improve the overall network performance (in terms of sum image quality, throughput, fairness, etc.). This would allow a node to adjust the data rate of the image corresponding to the rate decision of a transport layer congestion control mechanism, such as TCP or TFRC. Ideally, the distortion of the received scaled image or video D(N → L)would be proportional to the amount of scaling performed, i.e. D(L) ≈ D(N → L) Where N is the length of the original compressed image, L is the length of the scaled image or video and N → L represents an image of length N scaled to length L. JPEG2000 offers mechanisms for accomplishing this.
In-network rate control and scaling of the bit stream in JPEG2000 can be implemented simply by truncating the bit stream, allowing for very simple implementation. To accomplish this, the encoder needs to create an embedded bitstream, or a bitstream in which the lower quality components are embedded within the bit stream. This can also be looked at as enhancement, where the base image is encoded at the beginning of the bit stream and the remainder of the bit stream enhances the quality of that base image. Before discussing the overall bitstream format, we will clarify the differences between quality (or distortion) and resolution scalability:
Resolution Scalable
A resolution scalable image is one in which each added element will increase the resolution of the image. In JPEG2000, this corresponds to the different subband levels resulting from the discrete wavelet transform (DWT). For example, if there are 4 subband levels in the DWT transform, the beginning of the bitstream would consist of the elements representing LL4, which could be independently decoded. The next elements would represent LH4, HL4 and HH4, which would be synthesized to recreate LL3. Because of the nature of the DWT transform, the image reconstructed using LL3 and LL4 would have twice the resolution of that reconstructed from LL4 alone.
Quality Scalable
Quality scalability follows the basic definition of an embedded bitstream in that each element which is added enhances the quality. In other words, there is no way in which adding an element will reduce the quality. Embedded block coding such as bit-plane coding can accomplish this. Essentially, in bit-plane coding successive portions of the bit stream represent finer quantization of the image, resulting in increased quality.
To realize this, the JPEG2000 standard uses embedded block coding with optimal truncation (EBCOT). With EBCOT, each subband is divided into small code blocks. These blocks are encoded independently using an embedded block coder. The length of each block is then chosen such that the overall distortion for the entire subband is minimized for a given overall length restriction. To then make the stream scalable, portions of each code block which contributes progressively to a series of quality layers are placed in order (and labeled).
For example, assume that there are three quality layers Q0, Q1 and Q2 and that there are 4 code blocks B0 – B3. The code blocks will have natural truncation points which result in performance close to the ideal rate distortion curve for that block, so we will only consider truncation of the code blocks at those points. For this example, assume that there are 4 truncation points in each code block T(1) – T(4) (T(0) represents no coding elements included). One potential scalable ordering is shown in the table below.
B0 | B1 | B2 | B3 | |
Q2 | T(4) | T(4) | T(4) | T(4) |
Q1 | T(2) | T(2) | T(3) | T(4) |
Q0 | T(0) | T(2) | T(1) | T(2) |
As long as all of Q0 is transmitted, each additional element received can be used to improve the quality of one block within the subband, which will result in higher quality of the subband and there lower distortion in the recovered image.