Levels for MPEG-4 Visual Profiles

by Fernando Pereira and Paulo Nunes (Instituto Superior Técnico, Lisboa - Portugal)

The MPEG-4 Visual standard defines (by October 2001) 18 visual object types and 19 visual profiles. Nine visual profiles have been defined in MPEG-4 Visual Version 1 [MPEG4-2]: Simple, Simple Scalable, Core, Main, N-bit, Scaleable Texture, Simple Face Animation, Basic Animated Texture, and Hybrid.

Six additional visual profiles have been defined in MPEG-4 Visual Version 2 [MPEG4-2]: Core Scalable, Advanced Core, Advanced Coding Efficiency, Advanced Real Time Simple, Advanced Scaleable Texture, and Simple FBA.

Moreover 2 additional profiles have been defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a]: Simple Studio and Core Studio. And 2 profiles in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b]: Advanced Simple and Fine Granularity Scalability.

In the following, the mechanism specified to define video levels – Video Buffering Verifier – as well as the visual levels defined for all visual profiles will be presented.

A.1 Video Buffering Verifier Mechanism

The idea of using a Video Buffering Verifier mechanism to bound the decoding complexity of a given set of bitstreams is not new, and was already adopted in previous MPEG video coding standards, MPEG-1 [MPEG1-2] and MPEG-2 [MPEG2-2]. In these standards, the major purpose of the Video Buffering Verifier mechanism was to set some restrictions on the maximum variability of the number of bits per picture, especially in the case of constant bitrate operation, and thus on the complexity of the encoded video streams.

Generically, the complexity of the encoded video is directly related to the encoded bitrate and to the decoded video data rate that the decoder generates, e.g. measured in terms of the number of MB/s. For frame-based video coding, e.g. MPEG-1 and MPEG-2, the decoded video data rate is typically constant since the frames have fixed dimensions and are usually encoded at fixed frame rates. This is not the general case for object-based video coding, as in MPEG-4, since the several video objects composing a scene may vary in size along time and may be encoded at different VOP rates. Therefore, the amount and type[1] of MB/s that a given object-based video decoder has to process may largely vary over time in comparison with frame-based coding solutions [Nunes].

In the MPEG-4 context, to limit the decoding complexity of a set of bitstreams corresponding to a video scene it is then necessary to set some limits on the variability of the number of decoded MB/s, and their complexity, and also on the picture memory required to store the decode data. This constitutes the major novelty of the MPEG-4 Video Buffering Verifier mechanism, relatively to the previous MPEG standards, since it does not only bound the bitstream buffer memory but also the MB decoding capacity and the MB picture memory.

The MPEG-4 Video Buffering Verifier mechanism [MPEG4-2; Annex D] consists of three normative models, see Figure A.1, each one defining a set of rules and limits to verify if the amount required for a specific type of decoding resource is within the values allowed by the corresponding profile and level specification, see Table A.1:

  1. Video Rate Buffer Verifier (VBV) – This model is used to verify that the bitstream memory required at the decoder(s) does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VBV buffer sizes for all the VOLs corresponding to the objects building the scene. Each VBV buffer size corresponds to the maximum amount of bits that the decoder can store in the bitstream memory for the corresponding VOL; there is, however, also a limitation on the sum of the VOL VBV buffer sizes. The bitstream memory is the memory where the decoder puts the bits received for a VOL while waiting to be decoded.
  2. Video Complexity Verifier (VCV) – This model is used to verify that the computational power (processing speed), defined in terms of MB/s, required at the decoder does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VCV MB/s decoding rate and VCV buffer size and is applied to all MBs in the scene. If arbitrarily shaped VOs exist in the scene, an additional VCV buffer and VCV decoding rate is also defined, to be applied only to the boundary MBs.
  3. Video Reference Memory Verifier (VMV) – This model is used to verify that the picture memory required at the decoder for the decoding of a given scene does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VMV buffer size, which is the maximum number of decoded MBs that the decoder can store during the decoding process of all VOLS corresponding to the scene.


Figure A.1 Video buffering verifier model [MPEG4-2]

The Video Presentation Model (VPM) is not a normative part of the MPEG-4 Visual specification [MPEG4-2]. It is an algorithm for checking that the set of bitstreams corresponding to a scene does not require an amount of presentation memory higher than a given amount of memory expressed in units of MB. It is also used to constraint the speed of the compositor in terms of maximum number of MB/s. The Video Presentation Verifier (VPV) operates in the same way as the VCV in terms of occupancy dynamics [MPEG4-2].

In order that the set of visual elementary streams corresponding to a given scene may be considered compliant with a given profile and level, the encoder must guarantee that none of the above mentioned buffers overflows and, additionally, it must also guarantee that, in certain circumstances, the VBV buffer never underflows.

A.1.1 Video Rate Buffer Verifier Definition

The MPEG-4 VBV model defines a set of rules and limits for examining a video elementary bitstream with a delivery rate function, R(t). This model simulates the occupancy of the decoder bitstream buffer in order to control the amount of bitstream memory required at the decoder. Its purpose is to guarantee that the bitstream memory required is less than the specified buffer size, i.e. to verify that the decoder bitstream buffer occupancy never goes beyond the limits of the specified buffer size for the relevant profile@level. In the case of visual scenes composed by multiple VOs, each with one or more VOLs, the MPEG-4 Visual standard specifies that the video rate buffer model shall be applied independently to each VOL (using a particular buffer size and rate function for each VOL). Additionally, the maximum total bitstream buffer size (defined as the sum of all VOL bitstream buffer sizes) for the given profile and level shall not be exceeded, see Table A.1. Notice that the bitrate and buffer size allocation, among the several VOs and, for each VO, among the several VOLs, is a non-normative issue although it can significantly determine the performance of object-based video encoders, and thus deserves careful attention.

The VBV applies to video data encoded as a combination of I-, P-, B-, and S-VOPs, using several coding tools organized in terms of video object types. Face animation, still texture, and mesh objects are not constrained by the VBV model. The coded video bitstreams shall be constrained to comply with the requirements of the VBV specified in the following sections.

A.1.1.1 VBV Model Parameters

The VBV model for a given elementary stream (ES) is defined by the three following parameters: vbv_buffer_size, vbv_occupancy, and bit_rate. These parameters have to be defined for all the ESs corresponding to the various objects in a scene. These parameters can be specified at video level, this means through the video ES, or by means of systems level configuration information [MPEG4-1]. In the first case, the VBV model parameters are specified in the VOL header, when the one-bit flag vbv_parameters is set to ‘1’. In the second case, the VBV model parameters are conveyed to the video decoder through the Object Description Information, more precisely through the DecoderConfigDescriptor field of the ES_Descriptor associated to the ES in question.

When the vbv_buffer_size and vbv_occupancy parameters are specified by systems level configuration information, the bitstream shall be constrained according to the specified values, and these values shall not be part of the video ES. It may happen, however, that these parameters are not explicitly specified; in this case, it is assumed that the ES is constrained according to the default values of the corresponding profile and level combination[2].

  • VBV Buffer Size
    The VBV buffer size for a VOL specifies the minimum bitstream memory required at the decoder to properly decode the corresponding VOL ES. The VBV buffer size for a VOL is defined by the 18-bit vbv_buffer_size field in units of 16384 bits (the value zero is forbidden). The maximum VBV buffer size in bits, vbvES, is then given by
    vbvES = 16384 × vbv_buffer_size


    The vbv_buffer_size value is bounded by Max VOL VBV buffer size in Table A.1, which specifies the levels’ constraints, and the sum of all these values for all VOLs is bounded by Max total VBV buffer size.

    The default value of vbv_buffer_size for a VOL is the maximum value of vbv_buffer_size allowed for the profile and level combination in question (called Max VOL VBV buffer size). Still, it must be checked that the sum of the vbv_buffer_size default values does not exceed Max total VBV buffer size.

    In terms of the levels specification shown in Table A.1, there are two constraints defined: Max VOL VBV buffer size, which sets the limit for each VOL, and Max total VBV buffer size, which sets the limit on the sum of all the VOL buffer sizes.

  • VBV Occupancy
    The VBV occupancy for a VOL specifies the initial occupancy of the VBV buffer for that VOL, this means the occupancy that the VBV must reach in order the decoding process may start with the removal of the first VOP following the VOL header; this parameter, together with the bit_rate parameter, establishes the initial decoding delay, the so-called VBV latency. The VBV occupancy is defined by the 26-bit vbv_occupancy field in units of 64 bits[3].

    The default value of vbv_occupancy for a VOL, in 64-bit units, is given by 170 ´ vbv_buffer_size (for that VOL), where vbv_buffer_size is in 16384-bit units; of course, the maximum value of vbv_occupancy is vbv_buffer_size for the corresponding VOL. This corresponds to an initial occupancy (before the removal of the first VOP from the buffer) in bits, vbv0, of approximately two-thirds of the defined buffer size, i.e.

    Note that there is no explicit limitation on vbv_occupancy in terms of levels definition.

  • Bitrate
    When present for a VOL, the bitrate parameter, bit_rate, defined by the 30-bit bit_rate field in units of 400 bits per second (value zero is forbidden), specifies the ES peak bitrate for VOLij [4], such that



    where RVOLij(t) is defined as the instantaneous VOL channel bitrate for VOLij (in bits per second) counting only the visual syntax.

    If the channel, with a total instantaneous channel rate, R(t), is a serial time multiplex of several streams (e.g. as defined by MPEG-4 Systems [MPEG4-1]) then RVOLijl(t)=R(t) for the time instants where the channel is occupied by the relevant VOLij bits; otherwise it is zero [MPEG4-2], i.e.



    Notice that the purpose of the bitrate parameter is to provide an upper bound on the VOL ES bitrate rather than a precise value of the actual VOL bitrate since MPEG-4 Visual does not specify any temporal window to measure the actual ES bitrate.

    In terms of the levels specification shown in Table A.1, only the sum of the bitrate for all the VOLs for all the objects in the scene is bounded, assuming that this total bitrate can be shared among the VOLs at author’s wishes (signaled using the bit_rate field for each VOL).

A.1.1.2 VBV Occupancy Dynamics

The VBV occupancy dynamics specifies when the bitstream bits enter the VBV buffer and when they are removed from it to be decoded, i.e. the process by which the VBV buffer is filled and drained. This process is mainly driven by the time instants at which the VOP bits are removed from the VBV.

  • VBV Buffer Filling
    The VBV buffer for each ES is initially empty and filled as coded data arrives, until it reaches the value specified in the vbv_occupancy field or the first VOP decoding time arrives. The first bit that is put in the VBV buffer is the first bit of the elementary stream (the VOL header bits are not taken into account since they are not considered to be part of the elementary stream data, see [MPEG4-2]).

  • VBV Buffer Draining
    The VBV buffer is instantaneously emptied at the VOP decoding times (see Figure A.2, which shows the VBV occupancy for a VOL, vbv(t), as a function of time). This instantaneous removal property distinguishes the VBV buffer model from a real bitstream buffer. This way, the model accommodates the worst case scenario, i.e. the case where the decoder stores all the encoded data for the current VOP in its bitstream buffer before it starts decoding it.

    Figure A.2 Dynamics of the VBV occupancy for one VOL [Nunes]


  • VOP Decoding Time Computation
    In order to keep a good estimate of the decoder bitstream buffer occupancy, the encoder needs to know when the encoded data shall be removed from the VBV buffer, i.e. the VOP decoding times. Since the VOP time information carried in the VOP ES is the VOP composition time, the encoder needs to compute the corresponding VOP decoding time from this information. In MPEG-4 Visual [MPEG4-2], the time at which each VOP must be available in the composition memory for composition is given by this VOP composition time plus a fixed delay: VCV Latency (see description in section A.1.2). This delay sets the minimum latency of the decoding process.

    The usage, in some profiles, of B-VOPs, which may be coded using more than one prediction (i.e. may be predicted from preceding I- or P-VOPs - forward prediction, and from upcoming I- or P-VOPs - backward prediction), implies that the VOP decoding order and the VOP composition order are different for these cases. In fact, some VOPs must be decoded in advance, i.e. before their natural composition order, because they are needed for the prediction of other VOPs. In terms of decoder operation, this implies additional delay and VOP memory for the decoding and storage of the backward predictions.

    MPEG-4 Visual clearly defines the time instants at which a given VOP has to be available at the bitstream buffer (all its bits) for decoding; these time instants have to be computed by the encoder in order to track the occupancy of the decoder bitstream buffer. For further details, see [MPEG4-2; Annex D]

A.1.1.3 VBV Model Constraints

This section applies to all the cases considered in the VBV model except for basic sprites, which have a special treatment. The first I-VOP of a sprite VO is divided into N sections of 396 MBs and each section is treated as a different VOP. The remaining S-VOPs are treated as any other VOP.

  • Constraints on VBV Occupancy
    The main constraint imposed to the VBV model is that each VOL VBV buffer shall never overflow or underflow. The VBV buffer occupancy for a VOL, immediately following the removal of VOP i from the bitstream buffer, vbvi, as shown in Figure A.2, can be iteratively defined by equation (A.1)[5]
    (A.1)


    where vbv0- is the initial VBV occupancy just before the removal of the first VOP from the buffer, d0 is the number of bits for the first VOP in the ES (VOP 0), and di is the number of bits for VOP i.

    The conditions that the VBV buffer never overflows or underflows, can then be expressed by
    for all i,


    where vbvBS is the buffer size in bit units for the relevant VOL.

  • Constraints on VOP coded size
    The VBV occupancy constraints for a VOL impose that the coded VOP size must always be less than the VBV buffer size, i.e. di < vbv_buffer_size for all i.

    Annex D of MPEG-4 Visual includes the VBV model restrictions that apply in the case the Short Video Header is in use [MPEG4-2].

A.1.2 Video Complexity Verifier Definition

The MPEG-4 VCV model defines a set of rules and limits for examining a set of ESs building a visual scene to control if the required amount of decoder processing power is less than the maximum complexity specified for the given profile and level, both measured in MBs per second, see Table A.1. This model is applied to all MBs of all ESs of the scene together.

The VCV applies to video objects encoded as a combination of I-, P-, B- and S-VOPs[6]. A separate VCV model applies to still texture objects [MPEG4-2]. Face animation and mesh objects are not constrained by this model.

The coded video bitstreams for a certain scene shall be constrained to globally comply with the requirements of the VCV defined in the following sections.

A.1.2.1 VCV Model Parameters

The VCV model consists in two virtual buffers accumulating the number of MBs in the encoded data:

  1. The VCV Buffer accumulates all MBs of all VOLs for the scene.
  2. The Boundary MB VCV Buffer (B-VCV)[7] accumulates only boundary MBs.

Notice that boundary MBs (i.e. MBs including shape information which is not totally transparent or totally opaque) are included in both the VCV and the B-VCV buffers.

The VCV model is defined by the size of the buffers mentioned above, the corresponding draining rates (i.e. the VCV and B-VCV decoding rates), and the latency of the VCV model (which depends on the VCV buffer size and VCV decoding rate).

  • VCV Buffer Sizes and VCV Decoding Rates
    Each VCV buffer can be seen as a queue, instantaneously filled with all the MBs of each VOP at the VOP decoding time, and delivering MB encoded data to the decoding process at a constant rate.

    The size of each VCV buffer, respectively vcv_buffer_size and boundary_vcv_buffer_size, defines the maximum number of MBs that a given decoder can instantaneously have in the decoding queue to process, i.e. the maximum occupancy of the VCV buffers in MB units. In the current MPEG-4 Visual specification [MPEG4-2], the two buffers have always the same maximum dimension for all profile@levelcombinations.

    These MBs are consumed by the decoder, from each buffer, at a given VCV decoding rate, in MB/s, as specified for each profile@level. The VCV decoding rate, H, specifies the draining rate of the VCV buffer while the B-VCV decoding rate, HB , specifies the draining rate of the B-VCV. Together they define the maximum speed of the decoding process. As can be seen in Table A.1, the B-VCV decoding rate, HB , is typically half the VCV decoding rate, H.

    For each profile@level combination, MPEG-4 Visual defines the maximum VCV buffer size (the same for the VCV and B-VCV buffers) and the draining rates for the VCV and B-VCV buffers.

  • VCV Latency
    The VCV Latency, L, is defined as the time it takes to decode a full VCV buffer, and thus is given by the following equation

    (A.2)

    This parameter imposes a minimum latency in the decoding process, as explained in section A.1.1. Notice that, by definition, the latency of the VCV model is imposed by the VCV buffer not by the B-VCV buffer. Since the B-VCV decoding rate, HB, is typically half the VCV decoding rate, H, this means that it is not possible to decode a full B-VCV during a time interval of L since the two buffers have the same size. This implies that at full decoding rate, the amount of boundary MBs in the scene cannot exceed 50 % of the total number of MBs.

A.1.2.2 VCV Occupancy Dynamics

The VCV dynamics simulates the VOP decoding process. At the VOP decoding times, the VOP encoded data is added to the VCV buffers and is removed from these buffers as the decoding process progresses. The time instant at which a given VOP is completely decoded depends on the amount and type of MBs to be decoded, the occupancy of the VCV buffers at the VOP decoding time, and the maximum decoding speed specified through the VCV decoding rates for the profile@level in question.

  • VCV Buffer Filling
    Let Mi be the total number of MBs in VOP i, and MBi the number of boundary MBs in the same VOP. For S-VOPs, Mi is given by the hypothetical number of MBs in a S-VOP, MBS-VOP, as specified in [MPEG4-2; Annex D].

    The VCV buffer is empty at the start of decoding and is filled instantaneously with encoded data at VOP decoding times as the decoding process advances. At the VOP decoding time, ti, Mi is added to the VCV buffer occupancy, vcv(t), and simultaneously MBi is added to the B-VCV buffer occupancy, b-vcv(t).

  • VCV Buffer Draining
    The VCV buffers occupancies decrease linearly at rates H and HB, respectively for the VCV buffer and for B-VCV buffer, until its occupancy is zero or until the next VOP decoding time, tnext, where tnext is the earliest VOP decoding time greater than ti for any VOP of any ES of the scene.

    If the occupancy of the VCV buffers becomes zero, the VCV model decoder becomes idle and remains idle until tnext, as exemplified in Figure A.3.

    Figure A.3 Dynamics of the VCV occupancy [Nunes]

  • VOP Decoding Duration
    In order to avoid the violation of the VCV model, each VOP must be decoded in time. The interval of time where VOP i is being decoded extends from si to ei which are defined by equation (A.3)

    (A.3)

    where vcv(ti) is the VCV occupancy before the MBs representing VOP i, Mi, are added to vcv(t), H is the VCV decoding rate, b-vcv(ti) is the B-VCV occupancy before the boundary MBs of VOP i, MBi, are added to b-vcv(t), and HB is the B-VCV decoding rate.

A.1.2.3 VCV Model Constraints

Compliance regarding the VCV model can only be guaranteed if the set of ESs building a scene fulfills the constraints imposed by the VCV model relatively to the occupancy of the VCV buffers and the VOP decoding duration defined as follows:

  • Constraints on VCV Occupancy
    A given set of visual ESs building a scene conforms with a given profile@level with respect to the VCV model if they never overflow the VCV buffers.

    When the VCV buffers become empty, the decoder simply remains idle and the VCV buffer occupancies, vcv(t) and b-vcv(t), remain unchanged during the idle period; this is illustrated in Figure A.3, which shows the occupancy of a VCV buffer, vcv(t), as a function of time.

  • Constraints on VOP Decoding Duration
    In addition to not overflowing the VCV buffer, the decoding of each VOP i must be completed by τi + L (composition time plus the latency of the VCV decoding process). Notice that the latency L of the VCV decoding process is constant for all VOPs.

A.1.3 Video Reference Memory Verifier Definition

The MPEG-4 VMV model defines a set of rules and limits for examining the set of ESs building a visual scene to control if the required amount of decoder picture memory, measured in MB units, is less than the maximum memory specified for the chosen profile and level, see Table A.1. The VMV models the memory requirements of all VOLs of all VOs in the scene (this model assumes a common memory space, shared by all VOLs of all VOs).

The VMV applies to video objects encoded as a combination of I-, P-, B-, S-VOPs, and still texture objects. Face animation, mesh objects, and I-VOPs in basic sprite sequences are not constrained by this model.

The coded video bitstreams shall be constrained to comply with the requirements of the VMV defined in the following sections.

A.1.3.1 VMV Model Parameters

The VMV model consists of a MB buffer that accumulates all the decoded MBs of all VOPs and stores them until they are no longer needed for the prediction of other VOPs. The VMV model is defined by the size of this buffer, the vmv_buffer_size, defining the maximum amount of decoded MBs that the decoder can store at any time instant, see Table A.1.

A.1.3.2 VMV Occupancy Dynamics

The VMV dynamics simulates the decoded VOP memory allocation and de-allocation process. As each VOP is being processed, the decoder needs to allocate memory to store the decoded data. This data remains in the decoder memory until it is no longer needed, e.g. for prediction. At this point in time, the memory allocated to store this data is instantaneously released and can be used again.

  • VMV Buffer Filling
    The VMV buffer is initially empty and is filled with decoded data as each MB is decoded (see Figure A.4). For I-, P-, and B-VOPs, the amount of picture memory required for the decoding of the i-th VOP is defined as the number of MBs in the VOP, Mi. This memory, called reference memory in the MPEG-4 Visual standard [MPEG4-2], is consumed at the same constant rate specified for the VCV buffer (i.e. H MB/s) as the decoding process takes place. This solution contemplates the worst case scenario in terms of memory consumption since the VCV has the highest decoding rate (consumes memory faster than the B-VCV) and accumulates all the MBs (consumes all the needed memory).

    For S-VOPs, the amount of picture memory required for the decoding of the VOP is defined as the number of MBs in the reconstructed VOP. The memory used for storing the sprite is not constrained by the VMV model.

    The decoding duration of VOP i, Ti, is identical in the VCV and VMV models and starts at si and ends at ei , as defined in section A.1.2.

    Figure A.4 Dynamics of the VMV occupancy [Nunes]

  • VMV Buffer Draining
    The VMV draining depends on the coding type of the VOP being decoded, as explained in the following [MPEG4-2]:
    • I- and P-VOPs
      At the VOP composition time (or presentation time in a no-compositor decoder) plus VCV latency, τi + L, the total memory allocated to the previous I- or P-VOP in the decoding order is instantaneously released.

    • B-VOPs
      At the VOP composition time (or presentation time in a no-compositor decoder) plus VCV latency, τi + L, the total memory allocated to the current B-VOP is instantaneously released.

A.1.3.3 VMV Model Constraints

A given set of visual ESs building a scene conforms with a given profile@level, with respect to the VMV model, if it never overflows the VMV model buffer.

A.1.4 Interaction between the VBV, VCV, and VMV Models

A given set of ESs building a visual scene is considered compliant with a given profile and level if it fulfills all the constraints defined by the several Video Buffering Verifier models. Bitstream compliance with a given profile@level guarantees that the resources required at the decoder do not exceed a certain pre-defined amount corresponding to the relevant profile@level. Moreover it defines strict timing for completion of decoding and composition of VOPs as explained in the following:

  1. The VBV model defines the time at which the coded bits for each VOP are available for decoding and the time at which they should be removed from the VBV buffer - the coded bits for each VOP should be removed from the VBV buffer at the VOP decoding times, ti, computed from the composition time information in the video ES or conveyed by systems decoding time stamps.
  2. The VCV model defines the decoding speed of the MB data, and, thus, the time at which each VOP is available for composition - a given VOP should be available for composition, at most, at the VOP composition time plus the VCV latency, i.e. at the time it is supposed to be available to the compositor.
  3. The VMV model defines the amount of picture memory allocated at each time instant and the time it should be released - a given VOP should be removed from the VMV buffer at its composition time plus the VCV latency (B-VOP) or at the composition time plus the VCV latency of the next P or I VOP (I or P-VOPs).

The various models are independent but interact with each other in the following way:

  • The Decoder Cannot Decode Arbitrarily in Advance
    From a decoding point of view, it could be advantageous to process the incoming data as far in advance as possible; this is however constrained by two factors:
    1. The decoder can only start decoding if the bits are available for decoding - constraint imposed by the VBV model.
    2. As the decoder decodes the incoming data, it generates macroblocks that consume picture memory; if the decoder decodes too fast, it may not have enough picture memory to store the decode data - constraint imposed by the VMV model.

  • The Decoder Cannot Decode Too Late
    If the decoder starts decoding too late, then it may not be able to complete the decoding on time, and the following situations may occur:
    1. The VOP bits may be removed from the bitstream buffer before they could be decoded.
    2. The composition time for the current VOP may arrive without the VOP being completely decoded.
    3. The time to release the picture memory required for the prediction of the current VOP may arrive before the VOP could be decoded.

In order to avoid these situations, the Video Buffering Verifier mechanism imposes strict times for starting and ending any VOP decoding - constraint imposed by the VCV model.

The Video Buffering Verifier models provide the mechanism allowing any encoder to produce bitstreams that will be decodable by any decoder compliant with the selected profile@level. This mechanism allows to simultaneously limit the amount of decoding resources needed at the receiving terminals as well as ensure the timely reconstruction of the encoded information.

It is important to highlight that it is a major task of the encoder to simulate each of the Video Buffering Verifier models in order to produce bitstreams compliant with the intended profile and level. If any of these models tends to be violated, the encoder has to take appropriate countermeasures to avoid it. Although the Video Buffering Verifier is defined for the decoders, it is in fact a major module of any encoder generating compliant sets of bitstreams.

A.2 Definition of Levels for Video Profiles

Table A.1 describes the MPEG-4 Visual levels for the Version 1 and Version 2 profiles only including natural visual (or video) data, this means the so-called MPEG-4 video profiles. Note that Level 0 for the Simple profile has been defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].

Table A.1 Levels for the MPEG-4 video profiles

Visual profile

Level

Typical visual session size

Max. number of objects 1

Maximum number objects per type

Max. unique quant. tables

Max. VMV buffer size
(MB units)2

Max. VCV buffer size (MB)8

VCV decoder rate (MB/s) 4

VCV boundary MB
decoder rate (MB/s)9

Max. total VBV buffer size
(units of 16384 bits)5

Max. VOL VBV buffer size
(units of 16384 bits)

Max. video packet length (bits)6

Max. sprite size (MB units)

Wavelet restric­tions

Max. bitrate (kbit/s)

Max. enhancement layers
per object

Simple10

L0

QCIF

1

1 x Simple

1

198

99

1485

N.A.

10

10

2048

N. A.

N. A.

64

N. A.

Simple

L1

QCIF

4

4 x Simple

1

198

99

1485

N.A.

10

10

2048

N. A.

N. A.

64

N. A.

Simple

L2

CIF

4

4 x Simple

1

792

396

5940

N. A.

40

40

4096

N. A.

N. A.

128

N. A.

Simple

L3

CIF

4

4 x Simple

1

792

396

11880

N. A.

40

40

8192

N. A.

N. A.

384

N. A.

Advanced Real Time Simple

L1

QCIF

4

4 x Simple or Adv. Real Time Simple

1

198

99

1485

N.A.

10

10

8192

N. A.

N. A.

64

N. A.

Advanced Real Time Simple

L2

CIF

4

4 x Simple or Adv. Real Time Simple

1

792

396

5940

N. A.

40

40

16384

N. A.

N. A.

128

N. A.

Advanced Real Time Simple

L3

CIF

4

4 x Simple or Adv. Real Time Simple

1

792

396

11880

N. A.

40

40

16384

N. A.

N. A.

384

N. A.

Advanced Real Time Simple

L4

CIF

16

16 x Simple or Adv. Real Time Simple

1

792

396

11880

N. A.

80

80

16384

N. A.

N. A.

2000

N. A.

Simple Scalable

L1

CIF

4

4 x Simple or Simple Scalable

1

1782

495

7425

N. A.

40

40

2048

N. A.

N. A.

128

1 spatial or temporal enhancement layer

Simple Scalable3

L2

CIF

4

4 x Simple or Simple Scalable

1

3168

792

23760

N.A.

40

40

4096

N. A.

N. A.

256

1 spatial or temporal enhancement layer

Core

L1

QCIF

4

4 x Core or Simple

4

594

198

5940

2970

16

16

4096

N. A.

N. A.

384

1

Core

L2

CIF

16

16 x Core or Simple

4

2376

792

23760

11880

80

80

8192

N. A.

N. A.

2000

1

Advanced Core

L1

QCIF

4

4 x Core or Simple or Adv. Scalable Texture

4

594

198

5940

2970

16

8

4096

N. A.

see Table A.5

384

1

Advanced Core

L2

CIF

16

16 x Core or Simple or Adv. scalable Texture

4

2376

792

23760

11880

80

40

8192

N. A.

See Table A.5

2000

1

Core Scalable

L1

CIF

4

4 x Core or Simple or Core scalable or Simple Scalable

4

2376

792

14850

7425

64

64

4096

N.A.

N.A.

768

1

Core Scalable

L2

CIF

8

8 x Core or Simple or Core scalable or Simple

4

2970

990

29700

14850

80

80

4096

N.A.

N.A.

1500

1

Core Scalable

L3

CCIR601

16

16 x Core or Simple or Core scalable or Simple Scalable

4

12906

4032

120960

60480

80

80

16384

N. A.

N. A.

4000

2

Main

L2

CIF

16

16 x Main or Core or Simple

4

3960

1188

23760

11880

80

80

8192

1584

Scalable Texture Profile@L1

 2000

1

Main

L3

CCIR 601

32

32 x Main or Core or Simple

4

11304

3240

97200

48600

320

320

16384

6480

Scalable Texture Profile@L1

15000

1

Main

L4

1920 x 1088

32

32 x Main or Core or Simple

4

65344

16320

489600

244800

760

760

16384

65280

Scalable Texture Profile@L2

38400

1

Advanced Coding Efficiency

L1

CIF

4

4 x Adv. Coding Efficiency or Core or Simple

4

1188

792

11880

5940

40

40

8192

N. A.

N. A.

384

1

Advanced Coding Efficiency

L2

CIF

16

16 x Adv. Coding Efficiency or Core or Simple

4

2376

1188

23760

11880

80

80

8192

N. A.

N. A.

2000

1

Advanced Coding Efficiency

L3

CCIR 601

32

32 x Adv. Coding Efficiency or Core or Simple

4

9720

3240

97200

48600

320

320

16384

N. A.

N. A.

15000

1

Advanced Coding Efficiency

L4

1920 x 1088

32

32 x Adv. Coding Efficiency or Core or Simple

4

48960

16320

489600

244800

760

760

16384

N. A.

N. A.

38400

1

N-Bit

L2

CIF

16

16 x Core or Simple or N-Bit

4

2376

792

23760

11880

80

80

8192

N. A.7

N. A.

2000

1

Notes:
  1. Enhancement layers are not counted as separate objects.
  2. The maximum VMV (Video Memory Verifier) buffer size is the bound on the memory (in macroblock units) which can be used by the VMV algorithm. This algorithm (see [MPEG4-2; subclause D.5]) models the pixel memory needed by the entire visual decoding process. This includes the memory needed for reference VOPs in the prediction of P, B, and S(GMC)-VOPs and the storage of the reconstructed VOPs until the time they are released by the decoder, plus the memory required to queue B-VOPs until composition occurs. For the profiles that contain more than one layer, the memory requirements include all base and enhancement layers. When belonging to different, overlapping objects, some of these macroblocks may overlay on the display; however separate memory is required (prior to composition) in the VMV.
  3. The conformance point for the base layer of the Simple Scalable Visual profile is the Simple Profile@L1 when Simple Scalable Profile@L1 is used and the Simple Profile@L2 when Simple Scalable Profile@L2 is used.
  4. The VCV (Video Complexity Verifier) decoder rate is the vcv_decoder_rate (H) referred in [MPEG4-2; subclause D.4]; this parameter is the number of macroblocks/second based on the typical spatial and temporal resolutions, as follows:
    • 1485 MBs/s corresponds to QCIF at 15Hz
    • 5940 MBs/s corresponds to CIF at 15 Hz and also twice QCIF at 30 Hz
    • 11880 MB/s corresponds to CIF at 30 Hz
    • 7425 MB/s corresponds to 1.25 times CIF at 15 Hz
    • 23760 MB/s corresponds to twice CIF at 30 Hz
    • 97200 MB/s corresponds to twice ITU-R 601 at 30 Hz
    • 489600 MB/s corresponds to twice 1920x1088 at 30 Hz
  5. The total (aggregated) vbv_buffer_size is the sum of the individual VBV buffer occupancies at any given time (in units of 16384 bits) for all VOLs of all VOs. This total VBV size is limited according to the profile and level.
  6. The maximum video packet length is defined as the maximum number of bits of data_partitioned_motion_shape_texture() in one video packet. The constraint applies only when the data-partitioning tool is enabled in the bitstream. When data partitioning is disabled, there is no limit on the size of video packet length.
  7. N. A. means Not Applicable.
  8. The maximum VCV buffer size (cumulative over all layers of all VOs) is twice the maximum number of macroblocks per VOP in the profile and level combination except for the Simple Visual Profile, Simple Scalable profile (Level 1) and Advanced Real Time Simple Profile. For the Simple Visual Profile and the Advanced Real Time Simple Profile, this value is the maximum number of macroblocks per VOP. For the Simple Scalable profile (Level 1), it is 1.25 times the maximum number of macroblocks per VOP. The limit applies to both the VCV buffer and the boundary MB VCV buffer.
  9. The VCV boundary MB decoder rate column bounds the number of macroblocks containing non trivial shape information (boundary, not transparent nor opaque). The VCV boundary MB decoder rate constrains the total number of boundary MBs in all VOLs, concurrently. Note that the boundary macroblocks are added to both the VCV and boundary MB VCV buffers.
  10. For the Simple Profile@Level 0, the following restrictions apply:
    • The maximum frame rate shall be 15 frames per second;
    • The maximum f_code shall be 1;
    • The intra_dc_vlc_threshold shall be 0;
    • The maximum horizontal luminance pixel resolution shall be 176 pels/line;
    • The maximum vertical luminance pixel resolution shall be 144 pels/VOP;
    • If AC prediction is used, the following restriction applies : QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing QP value.

Table A.2 describes the MPEG-4 Visual levels for the Studio profiles defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a].

Table A.2 Levels for the Studio profiles

Visual profile

Level

Typical visual session formats1

Max. pixel depth

Max. number of objects

Max. number per type

Max. VMV buffer size (sample)2

Max. VCV buffer size (sample)3

VCV decoder rate (sample /s)

VCV boundary MB decoder rate (sample /s)

Max total VBV buffer size

Max VOL VBV buffer size

Max. video packet length (bits)

Max sprite size (sample)4

Wavelet restric­tions

Max bitrate (Mbit/s)

Max. enhancement layers per object

Simple Studio

L1

ITU-R601:4224

ITU-R601:444

10

1

1 x Simple Studio

1313280

1313280

33177600

33177600

576

576

N.A.

N.A.

N.A.

180

N.A.

Simple Studio

L2

ITU-R709.60I:422

ITU-R601:444444

10

1

1 x Simple Studio

4194304

4194304

125,829120

125,829120

1920

1920

N.A.

N.A.

N.A.

600

N.A.

Simple Studio

L3

ITU-R709. 60I:444

ITU-R709. 60I:4224

12

1

1 x Simple Studio

6291456

6291456

188,743680

188,743680

2880

2880

N.A.

N.A.

N.A.

900

N.A.

Simple Studio

L4

ITU-R709. 60P:444

ITU-R709. 60I:444444

2Kx2Kx30P:444

12

1

1 x Simple Studio

12582912

12582912

377487360

377487360

4320

4320

N.A.

N.A.

N.A.

1800

N.A.

Core Studio

L1

ITU-R601:4224

ITU-R601:444

10

4

4 x Core Studio or Simple Studio

5253120

2626560

66355200

66355200

576

576

N.A.

8294400

N.A.

90

N.A.

Core Studio

L2

ITU-R709.60I:422

ITU-R601:444444

10

4

4 x Core Studio or Simple Studio

16777216

8388608

251658240

251658240

1920

1920

N.A.

50135040

N.A.

300

N.A.

Core Studio

L3

ITU-R709. 60I:444

ITU-R709. 60I:4224

10

8

8 x Core Studio or Simple Studio

25165824

12582912

377487360

377487360

2880

2880

N.A.

75202560

N.A.

450

N.A.

Core Studio

L4

ITU-R709. 60P:444

ITU-R709. 60I:444444

2Kx2Kx30P:444

10

16

16 x Core Studio or Simple Studio

50331648

25165824

754974720

754974720

4320

4320

N.A.

150994944

N.A.

900

N.A.

Notes:

  1. ITU-R 709 is ITU-R BT. 709 and ITU-R 601 is ITU-R BT. 601; 444444 means 444(RGB) + 3 auxiliary channels; 4224 means 422(YUV)+ 1 auxiliary channel
  2. VMV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VMV also includes auxiliary channel samples.
  3. VCV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VCV also includes auxiliary channel samples.
  4. Maximum sprite size is defined by the number of samples for sprite memory.

Table A.3 describes the MPEG-4 Visual levels for the Advanced Simple and Fine Granularity Scalable profiles defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].

Table A.3 Levels for the Advanced Simple and Fine Granularity Scalable (FGS) profiles

Visual profile

Level

Typical visual session size

Max. number of objects

Max. number
per type

Max. unique quant. tables

Max. VMV buffer size (MB units)

Max. VCV buffer size (MB)

VCV decoder rate (MB/s)

Max. percentage of intra MBs with AC prediction in VCV buffer

Max total VBV buffer size
(units of 16384 bits)

Max. VOL VBV buffer size (units of 16384 bits)

Max. video packet length
(bits)

Maximum bitrate (kbit/s) 2

Maximum number of coded VOP-bps 3

Adv. Sim.

L0

176x144

1

1x AS or Simple

1

297

99

2970

100

10

10

2048

128

N.A.

Adv.Sim.

L1

176x144

4

4x AS or Simple

1

297

99

2970

100

10

10

2048

128

N.A.

Adv.Sim.

L2

352x288

4

4x AS or Simple

1

1188

396

5940

100

40

40

4096

384

N.A.

Adv.Sim.

L3

352x288

4

4x AS or Simple

1

1188

396

11880

100

40

40

4096

768

N.A.

Adv.Sim.

L4

352x576

4

4x AS or Simple

1

2376

792

23760

50

80

80

8192

3000

N.A.

Adv.Sim.

L5

720x576

4

4x AS or Simple

1

4860

1620

48600

25

112

112

16384

8000

N.A.

FGS

L0

176x144

1

1x AS or FGS or Simple

1

297

99

2970

100

10

10

2048

128

4

FGS

L1

176x144

4

4x AS or FGS or Simple

1

297

99

2970

100

10

10

2048

128

4

FGS

L2

352x288

4

4x AS or Simple

1

1188

396

5940

100

40

40

4096

384

4

FGS

L3

352x288

4

4x AS or FGS or Simple

1

1188

396

11880

100

40

40

4096

768

4

FGS

L4

352x576

4

4x AS or FGS or Simple

1

2376

792

23760

50

80

80

8192

3000

4

FGS

L5

720x576

4

4x AS or FGS or Simple

1

4860

1620

48600

25

112

112

16384

8000

4

Notes:

  1. The following restriction applies to Level 0 of Advanced Simple profile and FGS profile: if AC prediction is used, the QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing the QP value.
  2. For the FGS profile, this column is the maximum base-layer bitrate.
  3. The maximum number of coded VOP-bps takes into consideration the shifted bits after applying frequency weighting and/or selective enhancement.
  4. The number of FGS, FGST, or FGS-FGST layers is always one. If the FGS layer and the FGST layer are separated, the number of total enhancement layers is two.
  5. The interlace tools are not used for levels L0, L1, L2, and L3 of the Advanced Simple and FGS profiles.
  6. It is inherent in the FGS profile that the base and enhancement layers are tightly coupled to each other. To avoid unnecessary memory storage, the following constraints apply to the decoding time relationship of the enhancement layer and the base layer:
    • Decoding and composition (or presentation in a no-compositor decoder) of each FGS or FGST VOP shall be performed in the same time unit.
    • Decoding of each FGS and FGST VOP shall be performed immediately after the reference base layer VOP(s) are decoded without violating the above constraint.

A.3 Definition of Levels for Synthetic Profiles

This section describes the MPEG-4 visual profiles including only synthetic visual data. Note that the profiles including only texture object types are here considered synthetic profiles since the video texture coding tool was developed by the MPEG SNHC subgroup with the initial target to code textures to map over 2D and 3D models.

A.3.1 Scalable Texture Profile

This profile includes three levels defined in Table A.4.

Table A.4 Levels for the Scalable Texture profile

Profile

Level

Default wavelet filter

Max. download
filter length

Max. nº of  decomposition levels

Typical visual session size1

Max. Qp value

Max. nº of pixels/
session

VCV decoder rate (equivalent MB/s)2

Max. nº of bitplanes for DC values

Max. VCV buffer size (equivalent MB)2

Scalable Texture

L3

Float, Integer

ON, 15

10

8192

x8192

12 bits

67108864

262144

18

262144

Scalable Texture

L2

Integer

ON, 15

8

2048

x2048

10 bits

4194304

16384

16

16384

Scalable Texture

L1

Integer

OFF

5

704

x576

8 bits

405504

1584

13

1584

Notes:

  1. This column is for informative use only. It provides an example configuration of the maximum number of pixels per session.
  2. This still texture VCV model is separate from the global video VCV model. An equivalent MB corresponds to 256 pixels.

A.3.2 Simple Face Animation Profile

All MPEG-4 facial animation decoders (for all object types) are required to generate at their output a facial model including all the feature points defined in MPEG-4 Visual even if some of the features points will not be affected by any information received from the encoder.

The Simple Face Animation object type is not required to implement the viseme_def/expression_def functionality [MPEG4-2]. The Simple Face Animation profile has two levels:

Level 1

  1. Number of objects: 1
  2. The total FAP decode frame rate in the bitstream shall not exceed 72 Hz
  3. The decoder shall be capable of a face model rendering update of at least 15 Hz
  4. Maximum bitrate: 16 kbit/s
Level 2
  1. Maximum number of objects: 4
  2. The FAP decode frame rate in the bitstream shall not exceed 72 Hz (this means that the FAP decode frame rate is to be shared among the objects)
  3. The decoder shall be capable of rendering the face models with the update rate of at least 60 Hz, sharable between faces, with the constraint that the update rate for each individual face is not required to exceed 30 Hz
  4. Maximum bitrate: 32 kbit/s

A.3.3 Simple FBA Profile

All MPEG-4 Visual FBA decoders (for all object types) are required to generate at their output a humanoid model including all the feature points and joints defined in MPEG-4 Visual [MPEG4-2], even if some of the features points and joints will not be affected by any information received from the encoder. The Simple FBA profile has two levels:

Level 1

  1. Number of objects: 1
  2. The total FBA decode frame rate in the bitstream shall not exceed 72 Hz
  3. The decoder shall be capable of a humanoid model rendering update of, at least, 15 Hz
  4. Maximum bitrate: 32 kbit/s
  5. The decoder is not required to animate Spine3, Spine4 and Spine5 BAP groups [MPEG4-2]
Level 2
  1. Maximum number of objects: 4
  2. The FBA decode frame rate in the bitstream shall not exceed 72 Hz (this means that the FBA decode frame rate is to be shared among the objects)
  3. The decoder shall be capable of rendering the humanoid models with the update rate of, at least, 60 Hz, sharable between humanoids, with the constraint that the update rate for each individual humanoid is not required to exceed 30 Hz
  4. Maximum bitrate: 64 kbit/s

A.3.4 Advanced Core and Advanced Scalable Texture Profiles

The levels for the Advanced Core and Advanced Scalable Texture profiles are defined in Table A.5. Notice that Advanced Core includes video as well as still picture object types and thus has two types of level constraints (see Table A.1 and Table A.5).

Table A.5 Levels for the Advanced Core and Advanced Scalable Texture profiles

Visual profile

Level

Default wavelet filter

Max. download filter length

Max. nº of  decomposition levels

Typical visual session size1

Max. Qp value (bits)

Max. nº of pixels / session2

VCV decoder rate (equivalent MB/s)3

Max. nº of bitplains for DC values

Max. VCV buffer size (equivalent MB)

Max. STO packet length (bits)

Max. nº of pixels / tile

Max. nº of tiles

Advanced Core

L2

Integer

ON, 15

8

8192x8192

10

67108864

262144

16

262144

8192

262144

2048

Advanced Core

L1

Integer

OFF

5

2048x2048

8

4194304

16384

13

16384

4096

65536

1024

Advanced Scalable Texture

L3

Float, Integer

ON, 15

10

8192x8192

12

67108864

262144

18

262144

8192

67108864

4096

Advanced Scalable Texture

L2

Integer

ON, 15

8

2048x2048

10

4194304

16384

16

16384

4096

4194304

2048

Advanced Scalable Texture

L1

Integer

OFF

5

704x576

8

405504

1584

13

1584

2048

405504

(4xCIF)

1024

Notes:

  1. This column is for informative use only. It provides an example configuration of the maximum number of pixels per session.
  2. When the number of pixels per session is larger than the maximum number of pixels per tile, tiling_disable shall be 0.
  3. This still texture VCV model is separate from the global video VCV model. An equivalent MB corresponds to 256 pixels.

A.4 Definition of Levels for Synthetic and Natural Hybrid Profiles

The levels for the Profiles supporting both video object types and synthetic visual object types are specified by giving bounds for the natural objects and for the synthetic objects. Parameters like bitrate can be combined across natural and synthetic objects.

A.4.1 Basic Animated Texture Profile

The Basic Animated Texture profile has two levels defined as:

Level 1

Equivalent to the Simple Facial Animation Profile@Level 1 + Scalable Texture@Level 1 + the following restrictions on Basic Animated Texture object types:

  1. Maximum number of mesh objects (with uniform topology): 4
  2. Maximum total number of nodes (vertices) in mesh objects: 480 ( equivalent to 4 x number of nodes of a uniform mesh covering a QCIF image with 16x16 pixel elements)
  3. Maximum frame rate of a mesh object: 30 Hz
  4. Maximum bitrate of mesh objects: 64 kbit/s

Level 2

Equivalent to the Simple Facial Animation Profile@Level 2 + Scalable Texture@Level 2 + the following restrictions on Basic Animated Texture object types:

  1. Maximum number of Mesh objects (with uniform topology): 8
  2. Maximum total number of nodes (vertices) in Mesh objects: 1748 (equivalent to 4 x number of nodes of a uniform mesh covering a CIF image with 16x16 pixel elements)
  3. Maximum frame-rate of a Mesh object: 60 Hz
  4. Maximum bitrate of Mesh objects: 128 kbit/s

A.4.2 Hybrid Profile

The Hybrid profile has two levels defined as:

Level 1

Equivalent to the Core Visual Profile@Level 1 + Basic Animated Texture Profile@Level 1 + the following restrictions on Animated 2D Mesh object types:

  1. Maximum number of mesh objects (with uniform or Delaunay topology): 4 (the same as the maximum number of objects in visual session)
  2. Maximum total number of nodes (vertices) in mesh objects: 480 (equivalent to 4 x number of nodes of a uniform mesh covering a QCIF image with 16x16 pixel elements)
  3. Maximum frame-rate of a mesh object: 30 Hz (the same as the maximum frame rate of video object)
  4. Maximum bitrate of mesh objects: 64 kbit/s

Level 2

Equivalent to the Core Visual Profile@Level 2 + Basic Animated Texture Profile@Level 2 + the following restrictions on Animated 2D Mesh object types:

  1. Maximum number of mesh objects(with uniform or Delaunay topology): 8 (the same as the maximum number of objects in visual session)
  2. Maximum total number of nodes (vertices) in mesh objects: 1748 (equivalent to 4 x number of nodes of a uniform mesh covering a CIF image with 16x16 pixel elements)
  3. Maximum frame rate of a mesh object: 60 Hz (2 x the maximum frame rate of video object)
  4. Maximum bitrate of mesh objects: 128 kbit/s

References

[MPEG01a] MPEG, “Studio Profiles”, Final Draft Amendment, Doc. ISO/MPEG N3898, Pisa MPEG Meeting, January 2001
[MPEG01b] MPEG, “Streaming Video Profiles”, Final Draft Amendment, Doc. ISO/MPEG N3904, Pisa MPEG Meeting, January 2001
[MPEG1-2] ISO/IEC 11172-2:1993, “Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1,5 Mbit/s - Part 2: Video”, 1993
[MPEG2-2] ISO/IEC 13818-2:2000, “Generic Coding of Moving Pictures and Associated Audio In-formation – Part 2: Video”, 2000
[MPEG4-1] ISO/IEC 14496-1:2001, “Coding of Audio-Visual Objects - Part 1: Systems”, 2nd Edition, 2001
[MPEG4-2] ISO/IEC 14496-2:2001, “Coding of Audio-Visual Objects - Part 2: Visual”, 2nd Edition, 2001
[MPEG4-4] ISO/IEC 14496-4:2001, “Coding of Audio-Visual Objects - Part 4: Conformance Testing”, 2nd Edition, 2001
[Nunes] Nunes, P., “Rate Control for Object-Based Video Coding Architectures”, Ph.D. Thesis, Instituto Superior Técnico, Lisboa – Portugal, to be submitted


[1] For an arbitrarily shaped video object, three types of MBs may exist: transparent, opaque, and boundary.
[2] Except for the short video header case.
[3] For basic sprites, the vbv_occupancy field specifies the initial VBV occupancy before decoding the first S-VOP in the elementary stream, i.e. not the very first VOP in a basic sprite, which must be an I-VOP, but the subsequent VOP, i.e. an S-VOP. Low-latency sprites, which allow the transmission of large image sprites progressively (both spatially and in terms of quality), are treated as any other VOL.
[4] VOLij corresponds to VOLj of VOi.
[5] To avoid accumulating errors, the MPEG-4 Visual standard specifies that real-valued arithmetic should be used to compute vbvi.
[6] For sprites, a hypothetical number of MBs is defined for each S-VOP [MPEG4-2].
[7] The B-VCV is only defined for profiles supporting arbitrarily shaped video objects.


Why Join MPEGIF??


MPEGIF Logo Qualification Program



Download the NEW Whitepaper on MPEG-4

Quick Links

Search this site

Join MPEGIF's public mailing lists

Archive of Weekly News Digests


White Papers

White paper: High Efficiency AAC - World's Best Audio Codec

Intro to MPEG-7

Intro to MPEG-21


Brochures

What is AVC??

What is AAC?

What is (Advanced) Simple Profile?

On interactivity in MPEG-4


About MPEGIF's logo

Guidelines for Members and Non-members


MPEGIF Sponsored Events








©Copyright 2007 MPEG Industry Forum