The H.264/AVC bitstream consists of a hierarchy of layers.
The first layer is the Network Abstraction Layer (NAL) where the bitstream is divided into a set of NAL units. Some NAL units signal common control parameters to the decoder, such as the Sequence Parameter Sets (SPS) and Picture Parameter Sets (PPS). Others contain video data. The Video Coding Layer (VCL) NAL units contain slices of coded video. A coded frame or field is called an access unit and can be encoded as one or more slices.
A coded video sequence starts with an Instantaneous Decoder Refresh (IDR). All following video frames or fields are coded as slices. A new IDR signals that the previous video sequence is ended, and a new one is beginning.
Each NAL unit begins with a one byte header followed by the Raw Byte Sequence Payload (RBSP). The RBSP contains encoded slices. Slices are binary coded, so they may be padded with zero bits to ensure that the length is an integer number of bytes.
In the slice layer, each slice consists of a slice header and slice data. Slice data are specified as a series of macroblocks (MB) which includes the skip macroblock indicator (signal that macroblock position does not have data).
The MB layer specifies the MB structure. Each MB consists of:
- MB type (I,P or B)
- Prediction information that includes prediction mode for I-macroblock, reference frames and motion vectors for P and B macroblocks
- Coded Block Pattern (CBP) that indicates luma and choma blocks that have non-zero residual coefficients
- Quantization Parameter (QP) for non-zero macroblocks
- Residual data, for non-zero macroblocks