|
|
Levels for MPEG-4 Visual Profiles
by Fernando Pereira and Paulo Nunes (Instituto Superior Técnico, Lisboa - Portugal)
The MPEG-4 Visual standard defines (by October 2001) 18 visual object types and 19 visual profiles. Nine visual profiles have been defined in MPEG-4 Visual Version 1 [MPEG4-2]: Simple, Simple Scalable, Core, Main, N-bit, Scaleable Texture, Simple Face Animation, Basic Animated Texture, and Hybrid.
Six additional visual profiles have been defined in MPEG-4 Visual Version 2 [MPEG4-2]: Core Scalable, Advanced Core, Advanced Coding Efficiency, Advanced Real Time Simple, Advanced Scaleable Texture, and Simple FBA.
Moreover 2 additional profiles have been defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a]: Simple Studio and Core Studio. And 2 profiles in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b]: Advanced Simple and Fine Granularity Scalability.
In the following, the mechanism specified to define video levels – Video Buffering Verifier – as well as the visual levels defined for all visual profiles will be presented.
A.1 Video Buffering Verifier Mechanism
The idea of using a Video Buffering Verifier mechanism to bound the decoding complexity of a given set of bitstreams is not new, and was already adopted in previous MPEG video coding standards, MPEG-1 [MPEG1-2] and MPEG-2 [MPEG2-2]. In these standards, the major purpose of the Video Buffering Verifier mechanism was to set some restrictions on the maximum variability of the number of bits per picture, especially in the case of constant bitrate operation, and thus on the complexity of the encoded video streams.
Generically, the complexity of the encoded video is directly related to the encoded bitrate and to the decoded video data rate that the decoder generates, e.g. measured in terms of the number of MB/s. For frame-based video coding, e.g. MPEG-1 and MPEG-2, the decoded video data rate is typically constant since the frames have fixed dimensions and are usually encoded at fixed frame rates. This is not the general case for object-based video coding, as in MPEG-4, since the several video objects composing a scene may vary in size along time and may be encoded at different VOP rates. Therefore, the amount and type[1] of MB/s that a given object-based video decoder has to process may largely vary over time in comparison with frame-based coding solutions [Nunes].
In the MPEG-4 context, to limit the decoding complexity of a set of bitstreams corresponding to a video scene it is then necessary to set some limits on the variability of the number of decoded MB/s, and their complexity, and also on the picture memory required to store the decode data. This constitutes the major novelty of the MPEG-4 Video Buffering Verifier mechanism, relatively to the previous MPEG standards, since it does not only bound the bitstream buffer memory but also the MB decoding capacity and the MB picture memory.
The MPEG-4 Video Buffering Verifier mechanism [MPEG4-2; Annex D] consists of three normative models, see Figure A.1, each one defining a set of rules and limits to verify if the amount required for a specific type of decoding resource is within the values allowed by the corresponding profile and level specification, see Table A.1:
- Video Rate Buffer Verifier (VBV) – This model is used to verify that the bitstream memory required at the decoder(s) does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VBV buffer sizes for all the VOLs corresponding to the objects building the scene. Each VBV buffer size corresponds to the maximum amount of bits that the decoder can store in the bitstream memory for the corresponding VOL; there is, however, also a limitation on the sum of the VOL VBV buffer sizes. The bitstream memory is the memory where the decoder puts the bits received for a VOL while waiting to be decoded.
- Video Complexity Verifier (VCV) – This model is used to verify that the computational power (processing speed), defined in terms of MB/s, required at the decoder does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VCV MB/s decoding rate and VCV buffer size and is applied to all MBs in the scene. If arbitrarily shaped VOs exist in the scene, an additional VCV buffer and VCV decoding rate is also defined, to be applied only to the boundary MBs.
- Video Reference Memory Verifier (VMV) – This model is used to verify that the picture memory required at the decoder for the decoding of a given scene does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VMV buffer size, which is the maximum number of decoded MBs that the decoder can store during the decoding process of all VOLS corresponding to the scene.

Figure A.1 Video buffering verifier model [MPEG4-2]
The Video Presentation Model (VPM) is not a normative part of the MPEG-4 Visual specification [MPEG4-2]. It is an algorithm for checking that the set of bitstreams corresponding to a scene does not require an amount of presentation memory higher than a given amount of memory expressed in units of MB. It is also used to constraint the speed of the compositor in terms of maximum number of MB/s. The Video Presentation Verifier (VPV) operates in the same way as the VCV in terms of occupancy dynamics [MPEG4-2].
In order that the set of visual elementary streams corresponding to a given scene may be considered compliant with a given profile and level, the encoder must guarantee that none of the above mentioned buffers overflows and, additionally, it must also guarantee that, in certain circumstances, the VBV buffer never underflows.
A.1.1 Video Rate Buffer Verifier Definition
The MPEG-4 VBV model defines a set of rules and limits for examining a video elementary bitstream with a delivery rate function, R(t). This model simulates the occupancy of the decoder bitstream buffer in order to control the amount of bitstream memory required at the decoder. Its purpose is to guarantee that the bitstream memory required is less than the specified buffer size, i.e. to verify that the decoder bitstream buffer occupancy never goes beyond the limits of the specified buffer size for the relevant profile@level. In the case of visual scenes composed by multiple VOs, each with one or more VOLs, the MPEG-4 Visual standard specifies that the video rate buffer model shall be applied independently to each VOL (using a particular buffer size and rate function for each VOL). Additionally, the maximum total bitstream buffer size (defined as the sum of all VOL bitstream buffer sizes) for the given profile and level shall not be exceeded, see Table A.1. Notice that the bitrate and buffer size allocation, among the several VOs and, for each VO, among the several VOLs, is a non-normative issue although it can significantly determine the performance of object-based video encoders, and thus deserves careful attention.
The VBV applies to video data encoded as a combination of I-, P-, B-, and S-VOPs, using several coding tools organized in terms of video object types. Face animation, still texture, and mesh objects are not constrained by the VBV model. The coded video bitstreams shall be constrained to comply with the requirements of the VBV specified in the following sections.
A.1.1.1 VBV Model Parameters
The VBV model for a given elementary stream (ES) is defined by the three following parameters: vbv_buffer_size, vbv_occupancy, and bit_rate. These parameters have to be defined for all the ESs corresponding to the various objects in a scene. These parameters can be specified at video level, this means through the video ES, or by means of systems level configuration information [MPEG4-1]. In the first case, the VBV model parameters are specified in the VOL header, when the one-bit flag vbv_parameters is set to ‘1’. In the second case, the VBV model parameters are conveyed to the video decoder through the Object Description Information, more precisely through the DecoderConfigDescriptor field of the ES_Descriptor associated to the ES in question.
When the vbv_buffer_size and vbv_occupancy parameters are specified by systems level configuration information, the bitstream shall be constrained according to the specified values, and these values shall not be part of the video ES. It may happen, however, that these parameters are not explicitly specified; in this case, it is assumed that the ES is constrained according to the default values of the corresponding profile and level combination[2].
- VBV Buffer Size
The VBV buffer size for a VOL specifies the minimum bitstream memory required at the decoder to properly decode the corresponding VOL ES. The VBV buffer size for a VOL is defined by the 18-bit vbv_buffer_size field in units of 16384 bits (the value zero is forbidden). The maximum VBV buffer size in bits, vbvES, is then given by
vbvES = 16384 × vbv_buffer_size
The vbv_buffer_size value is bounded by Max VOL VBV buffer size in Table A.1, which specifies the levels’ constraints, and the sum of all these values for all VOLs is bounded by Max total VBV buffer size.
The default value of vbv_buffer_size for a VOL is the maximum value of vbv_buffer_size allowed for the profile and level combination in question (called Max VOL VBV buffer size). Still, it must be checked that the sum of the vbv_buffer_size default values does not exceed Max total VBV buffer size.
In terms of the levels specification shown in Table A.1, there are two constraints defined: Max VOL VBV buffer size, which sets the limit for each VOL, and Max total VBV buffer size, which sets the limit on the sum of all the VOL buffer sizes.
- VBV Occupancy
The VBV occupancy for a VOL specifies the initial occupancy of the VBV buffer for that VOL, this means the occupancy that the VBV must reach in order the decoding process may start with the removal of the first VOP following the VOL header; this parameter, together with the bit_rate parameter, establishes the initial decoding delay, the so-called VBV latency. The VBV occupancy is defined by the 26-bit vbv_occupancy field in units of 64 bits[3].
The default value of vbv_occupancy for a VOL, in 64-bit units, is given by 170 ´ vbv_buffer_size (for that VOL), where vbv_buffer_size is in 16384-bit units; of course, the maximum value of vbv_occupancy is vbv_buffer_size for the corresponding VOL. This corresponds to an initial occupancy (before the removal of the first VOP from the buffer) in bits, vbv0, of approximately two-thirds of the defined buffer size, i.e.
Note that there is no explicit limitation on vbv_occupancy in terms of levels definition.
- Bitrate
When present for a VOL, the bitrate parameter, bit_rate, defined by the 30-bit bit_rate field in units of 400 bits per second (value zero is forbidden), specifies the ES peak bitrate for VOLij [4], such that
where RVOLij(t) is defined as the instantaneous VOL channel bitrate for VOLij (in bits per second) counting only the visual syntax.
If the channel, with a total instantaneous channel rate, R(t), is a serial time multiplex of several streams (e.g. as defined by MPEG-4 Systems [MPEG4-1]) then RVOLijl(t)=R(t) for the time instants where the channel is occupied by the relevant VOLij bits; otherwise it is zero [MPEG4-2], i.e.
Notice that the purpose of the bitrate parameter is to provide an upper bound on the VOL ES bitrate rather than a precise value of the actual VOL bitrate since MPEG-4 Visual does not specify any temporal window to measure the actual ES bitrate.
In terms of the levels specification shown in Table A.1, only the sum of the bitrate for all the VOLs for all the objects in the scene is bounded, assuming that this total bitrate can be shared among the VOLs at author’s wishes (signaled using the bit_rate field for each VOL).
A.1.1.2 VBV Occupancy Dynamics
The VBV occupancy dynamics specifies when the bitstream bits enter the VBV buffer and when they are removed from it to be decoded, i.e. the process by which the VBV buffer is filled and drained. This process is mainly driven by the time instants at which the VOP bits are removed from the VBV.
- VBV Buffer Filling
The VBV buffer for each ES is initially empty and filled as coded data arrives, until it reaches the value specified in the vbv_occupancy field or the first VOP decoding time arrives. The first bit that is put in the VBV buffer is the first bit of the elementary stream (the VOL header bits are not taken into account since they are not considered to be part of the elementary stream data, see [MPEG4-2]).
- VBV Buffer Draining
The VBV buffer is instantaneously emptied at the VOP decoding times (see Figure A.2, which shows the VBV occupancy for a VOL, vbv(t), as a function of time). This instantaneous removal property distinguishes the VBV buffer model from a real bitstream buffer. This way, the model accommodates the worst case scenario, i.e. the case where the decoder stores all the encoded data for the current VOP in its bitstream buffer before it starts decoding it.

Figure A.2 Dynamics of the VBV occupancy for one VOL [Nunes]
- VOP Decoding Time Computation
In order to keep a good estimate of the decoder bitstream buffer occupancy, the encoder needs to know when the encoded data shall be removed from the VBV buffer, i.e. the VOP decoding times. Since the VOP time information carried in the VOP ES is the VOP composition time, the encoder needs to compute the corresponding VOP decoding time from this information. In MPEG-4 Visual [MPEG4-2], the time at which each VOP must be available in the composition memory for composition is given by this VOP composition time plus a fixed delay: VCV Latency (see description in section A.1.2). This delay sets the minimum latency of the decoding process.
The usage, in some profiles, of B-VOPs, which may be coded using more than one prediction (i.e. may be predicted from preceding I- or P-VOPs - forward prediction, and from upcoming I- or P-VOPs - backward prediction), implies that the VOP decoding order and the VOP composition order are different for these cases. In fact, some VOPs must be decoded in advance, i.e. before their natural composition order, because they are needed for the prediction of other VOPs. In terms of decoder operation, this implies additional delay and VOP memory for the decoding and storage of the backward predictions.
MPEG-4 Visual clearly defines the time instants at which a given VOP has to be available at the bitstream buffer (all its bits) for decoding; these time instants have to be computed by the encoder in order to track the occupancy of the decoder bitstream buffer. For further details, see [MPEG4-2; Annex D]
A.1.1.3 VBV Model Constraints
This section applies to all the cases considered in the VBV model except for basic sprites, which have a special treatment. The first I-VOP of a sprite VO is divided into N sections of 396 MBs and each section is treated as a different VOP. The remaining S-VOPs are treated as any other VOP.
- Constraints on VBV Occupancy
The main constraint imposed to the VBV model is that each VOL VBV buffer shall never overflow or underflow. The VBV buffer occupancy for a VOL, immediately following the removal of VOP i from the bitstream buffer, vbvi, as shown in Figure A.2, can be iteratively defined by equation (A.1)[5]
(A.1)
where vbv0- is the initial VBV occupancy just before the removal of the first VOP from the buffer, d0 is the number of bits for the first VOP in the ES (VOP 0), and di is the number of bits for VOP i.
The conditions that the VBV buffer never overflows or underflows, can then be expressed by
for all i,
where vbvBS is the buffer size in bit units for the relevant VOL.
- Constraints on VOP coded size
The VBV occupancy constraints for a VOL impose that the coded VOP size must always be less than the VBV buffer size, i.e. di < vbv_buffer_size for all i.
Annex D of MPEG-4 Visual includes the VBV model restrictions that apply in the case the Short Video Header is in use [MPEG4-2].
A.1.2 Video Complexity Verifier Definition
The MPEG-4 VCV model defines a set of rules and limits for examining a set of ESs building a visual scene to control if the required amount of decoder processing power is less than the maximum complexity specified for the given profile and level, both measured in MBs per second, see Table A.1. This model is applied to all MBs of all ESs of the scene together.
The VCV applies to video objects encoded as a combination of I-, P-, B- and S-VOPs[6]. A separate VCV model applies to still texture objects [MPEG4-2]. Face animation and mesh objects are not constrained by this model.
The coded video bitstreams for a certain scene shall be constrained to globally comply with the requirements of the VCV defined in the following sections.
A.1.2.1 VCV Model Parameters
The VCV model consists in two virtual buffers accumulating the number of MBs in the encoded data:
- The VCV Buffer accumulates all MBs of all VOLs for the scene.
- The Boundary MB VCV Buffer (B-VCV)[7] accumulates only boundary MBs.
Notice that boundary MBs (i.e. MBs including shape information which is not totally transparent or totally opaque) are included in both the VCV and the B-VCV buffers.
The VCV model is defined by the size of the buffers mentioned above, the corresponding draining rates (i.e. the VCV and B-VCV decoding rates), and the latency of the VCV model (which depends on the VCV buffer size and VCV decoding rate).
- VCV Buffer Sizes and VCV Decoding Rates
Each VCV buffer can be seen as a queue, instantaneously filled with all the MBs of each VOP at the VOP decoding time, and delivering MB encoded data to the decoding process at a constant rate.
The size of each VCV buffer, respectively vcv_buffer_size and boundary_vcv_buffer_size, defines the maximum number of MBs that a given decoder can instantaneously have in the decoding queue to process, i.e. the maximum occupancy of the VCV buffers in MB units. In the current MPEG-4 Visual specification [MPEG4-2], the two buffers have always the same maximum dimension for all profile@levelcombinations.
These MBs are consumed by the decoder, from each buffer, at a given VCV decoding rate, in MB/s, as specified for each profile@level. The VCV decoding rate, H, specifies the draining rate of the VCV buffer while the B-VCV decoding rate, HB , specifies the draining rate of the B-VCV. Together they define the maximum speed of the decoding process. As can be seen in Table A.1, the B-VCV decoding rate, HB , is typically half the VCV decoding rate, H.
For each profile@level combination, MPEG-4 Visual defines the maximum VCV buffer size (the same for the VCV and B-VCV buffers) and the draining rates for the VCV and B-VCV buffers.
- VCV Latency
The VCV Latency, L, is defined as the time it takes to decode a full VCV buffer, and thus is given by the following equation
(A.2)
This parameter imposes a minimum latency in the decoding process, as explained in section A.1.1. Notice that, by definition, the latency of the VCV model is imposed by the VCV buffer not by the B-VCV buffer. Since the B-VCV decoding rate, HB, is typically half the VCV decoding rate, H, this means that it is not possible to decode a full B-VCV during a time interval of L since the two buffers have the same size. This implies that at full decoding rate, the amount of boundary MBs in the scene cannot exceed 50 % of the total number of MBs.
A.1.2.2 VCV Occupancy Dynamics
The VCV dynamics simulates the VOP decoding process. At the VOP decoding times, the VOP encoded data is added to the VCV buffers and is removed from these buffers as the decoding process progresses. The time instant at which a given VOP is completely decoded depends on the amount and type of MBs to be decoded, the occupancy of the VCV buffers at the VOP decoding time, and the maximum decoding speed specified through the VCV decoding rates for the profile@level in question.
- VCV Buffer Filling
Let Mi be the total number of MBs in VOP i, and MBi the number of boundary MBs in the same VOP. For S-VOPs, Mi is given by the hypothetical number of MBs in a S-VOP, MBS-VOP, as specified in [MPEG4-2; Annex D].
The VCV buffer is empty at the start of decoding and is filled instantaneously with encoded data at VOP decoding times as the decoding process advances. At the VOP decoding time, ti, Mi is added to the VCV buffer occupancy, vcv(t), and simultaneously MBi is added to the B-VCV buffer occupancy, b-vcv(t).
- VCV Buffer Draining
The VCV buffers occupancies decrease linearly at rates H and HB, respectively for the VCV buffer and for B-VCV buffer, until its occupancy is zero or until the next VOP decoding time, tnext, where tnext is the earliest VOP decoding time greater than ti for any VOP of any ES of the scene.
If the occupancy of the VCV buffers becomes zero, the VCV model decoder becomes idle and remains idle until tnext, as exemplified in Figure A.3.
Figure A.3 Dynamics of the VCV occupancy [Nunes]
- VOP Decoding Duration
In order to avoid the violation of the VCV model, each VOP must be decoded in time. The interval of time where VOP i is being decoded extends from si to ei which are defined by equation (A.3)
(A.3)
where vcv(ti) is the VCV occupancy before the MBs representing VOP i, Mi, are added to vcv(t), H is the VCV decoding rate, b-vcv(ti) is the B-VCV occupancy before the boundary MBs of VOP i, MBi, are added to b-vcv(t), and HB is the B-VCV decoding rate.
A.1.2.3 VCV Model Constraints
Compliance regarding the VCV model can only be guaranteed if the set of ESs building a scene fulfills the constraints imposed by the VCV model relatively to the occupancy of the VCV buffers and the VOP decoding duration defined as follows:
- Constraints on VCV Occupancy
A given set of visual ESs building a scene conforms with a given profile@level with respect to the VCV model if they never overflow the VCV buffers.
When the VCV buffers become empty, the decoder simply remains idle and the VCV buffer occupancies, vcv(t) and b-vcv(t), remain unchanged during the idle period; this is illustrated in Figure A.3, which shows the occupancy of a VCV buffer, vcv(t), as a function of time.
- Constraints on VOP Decoding Duration
In addition to not overflowing the VCV buffer, the decoding of each VOP i must be completed by τi + L (composition time plus the latency of the VCV decoding process). Notice that the latency L of the VCV decoding process is constant for all VOPs.
A.1.3 Video Reference Memory Verifier Definition
The MPEG-4 VMV model defines a set of rules and limits for examining the set of ESs building a visual scene to control if the required amount of decoder picture memory, measured in MB units, is less than the maximum memory specified for the chosen profile and level, see Table A.1. The VMV models the memory requirements of all VOLs of all VOs in the scene (this model assumes a common memory space, shared by all VOLs of all VOs).
The VMV applies to video objects encoded as a combination of I-, P-, B-, S-VOPs, and still texture objects. Face animation, mesh objects, and I-VOPs in basic sprite sequences are not constrained by this model.
The coded video bitstreams shall be constrained to comply with the requirements of the VMV defined in the following sections.
A.1.3.1 VMV Model Parameters
The VMV model consists of a MB buffer that accumulates all the decoded MBs of all VOPs and stores them until they are no longer needed for the prediction of other VOPs. The VMV model is defined by the size of this buffer, the vmv_buffer_size, defining the maximum amount of decoded MBs that the decoder can store at any time instant, see Table A.1.
A.1.3.2 VMV Occupancy Dynamics
The VMV dynamics simulates the decoded VOP memory allocation and de-allocation process. As each VOP is being processed, the decoder needs to allocate memory to store the decoded data. This data remains in the decoder memory until it is no longer needed, e.g. for prediction. At this point in time, the memory allocated to store this data is instantaneously released and can be used again.
- VMV Buffer Filling
The VMV buffer is initially empty and is filled with decoded data as each MB is decoded (see Figure A.4). For I-, P-, and B-VOPs, the amount of picture memory required for the decoding of the i-th VOP is defined as the number of MBs in the VOP, Mi. This memory, called reference memory in the MPEG-4 Visual standard [MPEG4-2], is consumed at the same constant rate specified for the VCV buffer (i.e. H MB/s) as the decoding process takes place. This solution contemplates the worst case scenario in terms of memory consumption since the VCV has the highest decoding rate (consumes memory faster than the B-VCV) and accumulates all the MBs (consumes all the needed memory).
For S-VOPs, the amount of picture memory required for the decoding of the VOP is defined as the number of MBs in the reconstructed VOP. The memory used for storing the sprite is not constrained by the VMV model.
The decoding duration of VOP i, Ti, is identical in the VCV and VMV models and starts at si and ends at ei , as defined in section A.1.2.
Figure A.4 Dynamics of the VMV occupancy [Nunes]
- VMV Buffer Draining
The VMV draining depends on the coding type of the VOP being decoded, as explained in the following [MPEG4-2]:
- I- and P-VOPs
At the VOP composition time (or presentation time in a no-compositor decoder) plus VCV latency, τi + L, the total memory allocated to the previous I- or P-VOP in the decoding order is instantaneously released.
- B-VOPs
At the VOP composition time (or presentation time in a no-compositor decoder) plus VCV latency, τi + L, the total memory allocated to the current B-VOP is instantaneously released.
A.1.3.3 VMV Model Constraints
A given set of visual ESs building a scene conforms with a given profile@level, with respect to the VMV model, if it never overflows the VMV model buffer.
A.1.4 Interaction between the VBV, VCV, and VMV Models
A given set of ESs building a visual scene is considered compliant with a given profile and level if it fulfills all the constraints defined by the several Video Buffering Verifier models. Bitstream compliance with a given profile@level guarantees that the resources required at the decoder do not exceed a certain pre-defined amount corresponding to the relevant profile@level. Moreover it defines strict timing for completion of decoding and composition of VOPs as explained in the following:
- The VBV model defines the time at which the coded bits for each VOP are available for decoding and the time at which they should be removed from the VBV buffer - the coded bits for each VOP should be removed from the VBV buffer at the VOP decoding times, ti, computed from the composition time information in the video ES or conveyed by systems decoding time stamps.
- The VCV model defines the decoding speed of the MB data, and, thus, the time at which each VOP is available for composition - a given VOP should be available for composition, at most, at the VOP composition time plus the VCV latency, i.e. at the time it is supposed to be available to the compositor.
- The VMV model defines the amount of picture memory allocated at each time instant and the time it should be released - a given VOP should be removed from the VMV buffer at its composition time plus the VCV latency (B-VOP) or at the composition time plus the VCV latency of the next P or I VOP (I or P-VOPs).
The various models are independent but interact with each other in the following way:
- The Decoder Cannot Decode Arbitrarily in Advance
From a decoding point of view, it could be advantageous to process the incoming data as far in advance as possible; this is however constrained by two factors:
- The decoder can only start decoding if the bits are available for decoding - constraint imposed by the VBV model.
- As the decoder decodes the incoming data, it generates macroblocks that consume picture memory; if the decoder decodes too fast, it may not have enough picture memory to store the decode data - constraint imposed by the VMV model.
- The Decoder Cannot Decode Too Late
If the decoder starts decoding too late, then it may not be able to complete the decoding on time, and the following situations may occur:
- The VOP bits may be removed from the bitstream buffer before they could be decoded.
- The composition time for the current VOP may arrive without the VOP being completely decoded.
- The time to release the picture memory required for the prediction of the current VOP may arrive before the VOP could be decoded.
In order to avoid these situations, the Video Buffering Verifier mechanism imposes strict times for starting and ending any VOP decoding - constraint imposed by the VCV model.
The Video Buffering Verifier models provide the mechanism allowing any encoder to produce bitstreams that will be decodable by any decoder compliant with the selected profile@level. This mechanism allows to simultaneously limit the amount of decoding resources needed at the receiving terminals as well as ensure the timely reconstruction of the encoded information.
It is important to highlight that it is a major task of the encoder to simulate each of the Video Buffering Verifier models in order to produce bitstreams compliant with the intended profile and level. If any of these models tends to be violated, the encoder has to take appropriate countermeasures to avoid it. Although the Video Buffering Verifier is defined for the decoders, it is in fact a major module of any encoder generating compliant sets of bitstreams.
A.2 Definition of Levels for Video Profiles
Table A.1 describes the MPEG-4 Visual levels for the Version 1 and Version 2 profiles only including natural visual (or video) data, this means the so-called MPEG-4 video profiles. Note that Level 0 for the Simple profile has been defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].
Table A.1 Levels for the MPEG-4 video profiles
|
Visual profile
|
Level
|
Typical visual session size
|
Max. number of objects 1
|
Maximum number objects per type
|
Max. unique quant. tables
|
Max. VMV buffer size
(MB units)2
|
Max. VCV buffer size (MB)8
|
VCV decoder rate (MB/s) 4
|
VCV boundary MB
decoder rate (MB/s)9
|
Max. total VBV buffer size
(units of 16384 bits)5
|
Max. VOL VBV buffer size
(units of 16384 bits)
|
Max. video packet length (bits)6
|
Max. sprite size (MB units)
|
Wavelet restrictions
|
Max. bitrate (kbit/s)
|
Max. enhancement layers
per object
|
|
Simple10
|
L0
|
QCIF
|
1
|
1 x
Simple
|
1
|
198
|
99
|
1485
|
N.A.
|
10
|
10
|
2048
|
N. A.
|
N. A.
|
64
|
N. A.
|
|
Simple
|
L1
|
QCIF
|
4
|
4 x
Simple
|
1
|
198
|
99
|
1485
|
N.A.
|
10
|
10
|
2048
|
N. A.
|
N. A.
|
64
|
N. A.
|
|
Simple
|
L2
|
CIF
|
4
|
4 x
Simple
|
1
|
792
|
396
|
5940
|
N. A.
|
40
|
40
|
4096
|
N. A.
|
N. A.
|
128
|
N. A.
|
|
Simple
|
L3
|
CIF
|
4
|
4 x
Simple
|
1
|
792
|
396
|
11880
|
N. A.
|
40
|
40
|
8192
|
N. A.
|
N. A.
|
384
|
N. A.
|
|
Advanced Real Time Simple
|
L1
|
QCIF
|
4
|
4 x
Simple or Adv. Real Time Simple
|
1
|
198
|
99
|
1485
|
N.A.
|
10
|
10
|
8192
|
N. A.
|
N. A.
|
64
|
N. A.
|
|
Advanced Real Time Simple
|
L2
|
CIF
|
4
|
4 x
Simple or Adv. Real Time Simple
|
1
|
792
|
396
|
5940
|
N. A.
|
40
|
40
|
16384
|
N. A.
|
N. A.
|
128
|
N. A.
|
|
Advanced Real Time Simple
|
L3
|
CIF
|
4
|
4 x
Simple or Adv. Real Time Simple
|
1
|
792
|
396
|
11880
|
N. A.
|
40
|
40
|
16384
|
N. A.
|
N. A.
|
384
|
N. A.
|
|
Advanced
Real Time Simple
|
L4
|
CIF
|
16
|
16 x
Simple or Adv. Real Time Simple
|
1
|
792
|
396
|
11880
|
N. A.
|
80
|
80
|
16384
|
N. A.
|
N. A.
|
2000
|
N. A.
|
|
Simple
Scalable
|
L1
|
CIF
|
4
|
4 x
Simple or Simple Scalable
|
1
|
1782
|
495
|
7425
|
N. A.
|
40
|
40
|
2048
|
N. A.
|
N. A.
|
128
|
1 spatial
or temporal enhancement layer
|
|
Simple
Scalable3
|
L2
|
CIF
|
4
|
4 x
Simple or Simple Scalable
|
1
|
3168
|
792
|
23760
|
N.A.
|
40
|
40
|
4096
|
N. A.
|
N. A.
|
256
|
1 spatial
or temporal enhancement layer
|
|
Core
|
L1
|
QCIF
|
4
|
4 x Core
or Simple
|
4
|
594
|
198
|
5940
|
2970
|
| |