Resources

Public E-Mail Lists

Public Documents

Tutorials

Links/Misc

 

M4IF Home

About MPEG-4

About M4IF

News and Events

M4IF Membership Information

M4IF Members'
Area

MPEG-4 Resources

MPEG-4 Products
and Services

Contact us



©MPEG-4 Industry Forum

Last modified: Wed Feb 06 11:35:43 EST 2002

Levels for MPEG-4 Visual Profiles

by Fernando Pereira and Paulo Nunes (Instituto Superior Técnico, Lisboa - Portugal)

The MPEG-4 Visual standard defines (by October 2001) 18 visual object types and 19 visual profiles. Nine visual profiles have been defined in MPEG-4 Visual Version 1 [MPEG4-2]: Simple, Simple Scalable, Core, Main, N-bit, Scaleable Texture, Simple Face Animation, Basic Animated Texture, and Hybrid.

Six additional visual profiles have been defined in MPEG-4 Visual Version 2 [MPEG4-2]: Core Scalable, Advanced Core, Advanced Coding Efficiency, Advanced Real Time Simple, Advanced Scaleable Texture, and Simple FBA.

Moreover 2 additional profiles have been defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a]: Simple Studio and Core Studio. And 2 profiles in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b]: Advanced Simple and Fine Granularity Scalability.

In the following, the mechanism specified to define video levels – Video Buffering Verifier – as well as the visual levels defined for all visual profiles will be presented.

Down


A.1 Video Buffering Verifier Mechanism

The idea of using a Video Buffering Verifier mechanism to bound the decoding complexity of a given set of bitstreams is not new, and was already adopted in previous MPEG video coding standards, MPEG-1 [MPEG1-2] and MPEG-2 [MPEG2-2]. In these standards, the major purpose of the Video Buffering Verifier mechanism was to set some restrictions on the maximum variability of the number of bits per picture, especially in the case of constant bitrate operation, and thus on the complexity of the encoded video streams.

Generically, the complexity of the encoded video is directly related to the encoded bitrate and to the decoded video data rate that the decoder generates, e.g. measured in terms of the number of MB/s. For frame-based video coding, e.g. MPEG-1 and MPEG-2, the decoded video data rate is typically constant since the frames have fixed dimensions and are usually encoded at fixed frame rates. This is not the general case for object-based video coding, as in MPEG-4, since the several video objects composing a scene may vary in size along time and may be encoded at different VOP rates. Therefore, the amount and type[1] of MB/s that a given object-based video decoder has to process may largely vary over time in comparison with frame-based coding solutions [Nunes].

In the MPEG-4 context, to limit the decoding complexity of a set of bitstreams corresponding to a video scene it is then necessary to set some limits on the variability of the number of decoded MB/s, and their complexity, and also on the picture memory required to store the decode data. This constitutes the major novelty of the MPEG-4 Video Buffering Verifier mechanism, relatively to the previous MPEG standards, since it does not only bound the bitstream buffer memory but also the MB decoding capacity and the MB picture memory.

The MPEG-4 Video Buffering Verifier mechanism [MPEG4-2; Annex D] consists of three normative models, see Figure A.1, each one defining a set of rules and limits to verify if the amount required for a specific type of decoding resource is within the values allowed by the corresponding profile and level specification, see Table A.1:

  1. Video Rate Buffer Verifier (VBV) – This model is used to verify that the bitstream memory required at the decoder(s) does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VBV buffer sizes for all the VOLs corresponding to the objects building the scene. Each VBV buffer size corresponds to the maximum amount of bits that the decoder can store in the bitstream memory for the corresponding VOL; there is, however, also a limitation on the sum of the VOL VBV buffer sizes. The bitstream memory is the memory where the decoder puts the bits received for a VOL while waiting to be decoded.
  2. Video Complexity Verifier (VCV) – This model is used to verify that the computational power (processing speed), defined in terms of MB/s, required at the decoder does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VCV MB/s decoding rate and VCV buffer size and is applied to all MBs in the scene. If arbitrarily shaped VOs exist in the scene, an additional VCV buffer and VCV decoding rate is also defined, to be applied only to the boundary MBs.
  3. Video Reference Memory Verifier (VMV) – This model is used to verify that the picture memory required at the decoder for the decoding of a given scene does not exceed the values specified for the corresponding profile and level. The model is defined in terms of the VMV buffer size, which is the maximum number of decoded MBs that the decoder can store during the decoding process of all VOLS corresponding to the scene.


Figure A.1 Video buffering verifier model [MPEG4-2]

The Video Presentation Model (VPM) is not a normative part of the MPEG-4 Visual specification [MPEG4-2]. It is an algorithm for checking that the set of bitstreams corresponding to a scene does not require an amount of presentation memory higher than a given amount of memory expressed in units of MB. It is also used to constraint the speed of the compositor in terms of maximum number of MB/s. The Video Presentation Verifier (VPV) operates in the same way as the VCV in terms of occupancy dynamics [MPEG4-2].

In order that the set of visual elementary streams corresponding to a given scene may be considered compliant with a given profile and level, the encoder must guarantee that none of the above mentioned buffers overflows and, additionally, it must also guarantee that, in certain circumstances, the VBV buffer never underflows.

A.1.1 Video Rate Buffer Verifier Definition

The MPEG-4 VBV model defines a set of rules and limits for examining a video elementary bitstream with a delivery rate function, R(t). This model simulates the occupancy of the decoder bitstream buffer in order to control the amount of bitstream memory required at the decoder. Its purpose is to guarantee that the bitstream memory required is less than the specified buffer size, i.e. to verify that the decoder bitstream buffer occupancy never goes beyond the limits of the specified buffer size for the relevant profile@level. In the case of visual scenes composed by multiple VOs, each with one or more VOLs, the MPEG-4 Visual standard specifies that the video rate buffer model shall be applied independently to each VOL (using a particular buffer size and rate function for each VOL). Additionally, the maximum total bitstream buffer size (defined as the sum of all VOL bitstream buffer sizes) for the given profile and level shall not be exceeded, see Table A.1. Notice that the bitrate and buffer size allocation, among the several VOs and, for each VO, among the several VOLs, is a non-normative issue although it can significantly determine the performance of object-based video encoders, and thus deserves careful attention.

The VBV applies to video data encoded as a combination of I-, P-, B-, and S-VOPs, using several coding tools organized in terms of video object types. Face animation, still texture, and mesh objects are not constrained by the VBV model. The coded video bitstreams shall be constrained to comply with the requirements of the VBV specified in the following sections.

A.1.1.1 VBV Model Parameters

The VBV model for a given elementary stream (ES) is defined by the three following parameters: vbv_buffer_size, vbv_occupancy, and bit_rate. These parameters have to be defined for all the ESs corresponding to the various objects in a scene. These parameters can be specified at video level, this means through the video ES, or by means of systems level configuration information [MPEG4-1]. In the first case, the VBV model parameters are specified in the VOL header, when the one-bit flag vbv_parameters is set to ‘1’. In the second case, the VBV model parameters are conveyed to the video decoder through the Object Description Information, more precisely through the DecoderConfigDescriptor field of the ES_Descriptor associated to the ES in question.

When the vbv_buffer_size and vbv_occupancy parameters are specified by systems level configuration information, the bitstream shall be constrained according to the specified values, and these values shall not be part of the video ES. It may happen, however, that these parameters are not explicitly specified; in this case, it is assumed that the ES is constrained according to the default values of the corresponding profile and level combination[2].

A.1.1.2 VBV Occupancy Dynamics

The VBV occupancy dynamics specifies when the bitstream bits enter the VBV buffer and when they are removed from it to be decoded, i.e. the process by which the VBV buffer is filled and drained. This process is mainly driven by the time instants at which the VOP bits are removed from the VBV.

A.1.1.3 VBV Model Constraints

This section applies to all the cases considered in the VBV model except for basic sprites, which have a special treatment. The first I-VOP of a sprite VO is divided into N sections of 396 MBs and each section is treated as a different VOP. The remaining S-VOPs are treated as any other VOP.

A.1.2 Video Complexity Verifier Definition

The MPEG-4 VCV model defines a set of rules and limits for examining a set of ESs building a visual scene to control if the required amount of decoder processing power is less than the maximum complexity specified for the given profile and level, both measured in MBs per second, see Table A.1. This model is applied to all MBs of all ESs of the scene together.

The VCV applies to video objects encoded as a combination of I-, P-, B- and S-VOPs[6]. A separate VCV model applies to still texture objects [MPEG4-2]. Face animation and mesh objects are not constrained by this model.

The coded video bitstreams for a certain scene shall be constrained to globally comply with the requirements of the VCV defined in the following sections.

A.1.2.1 VCV Model Parameters

The VCV model consists in two virtual buffers accumulating the number of MBs in the encoded data:

  1. The VCV Buffer accumulates all MBs of all VOLs for the scene.
  2. The Boundary MB VCV Buffer (B-VCV)[7] accumulates only boundary MBs.

Notice that boundary MBs (i.e. MBs including shape information which is not totally transparent or totally opaque) are included in both the VCV and the B-VCV buffers.

The VCV model is defined by the size of the buffers mentioned above, the corresponding draining rates (i.e. the VCV and B-VCV decoding rates), and the latency of the VCV model (which depends on the VCV buffer size and VCV decoding rate).

A.1.2.2 VCV Occupancy Dynamics

The VCV dynamics simulates the VOP decoding process. At the VOP decoding times, the VOP encoded data is added to the VCV buffers and is removed from these buffers as the decoding process progresses. The time instant at which a given VOP is completely decoded depends on the amount and type of MBs to be decoded, the occupancy of the VCV buffers at the VOP decoding time, and the maximum decoding speed specified through the VCV decoding rates for the profile@level in question.

A.1.2.3 VCV Model Constraints

Compliance regarding the VCV model can only be guaranteed if the set of ESs building a scene fulfills the constraints imposed by the VCV model relatively to the occupancy of the VCV buffers and the VOP decoding duration defined as follows:

A.1.3 Video Reference Memory Verifier Definition

The MPEG-4 VMV model defines a set of rules and limits for examining the set of ESs building a visual scene to control if the required amount of decoder picture memory, measured in MB units, is less than the maximum memory specified for the chosen profile and level, see Table A.1. The VMV models the memory requirements of all VOLs of all VOs in the scene (this model assumes a common memory space, shared by all VOLs of all VOs).

The VMV applies to video objects encoded as a combination of I-, P-, B-, S-VOPs, and still texture objects. Face animation, mesh objects, and I-VOPs in basic sprite sequences are not constrained by this model.

The coded video bitstreams shall be constrained to comply with the requirements of the VMV defined in the following sections.

A.1.3.1 VMV Model Parameters

The VMV model consists of a MB buffer that accumulates all the decoded MBs of all VOPs and stores them until they are no longer needed for the prediction of other VOPs. The VMV model is defined by the size of this buffer, the vmv_buffer_size, defining the maximum amount of decoded MBs that the decoder can store at any time instant, see Table A.1.

A.1.3.2 VMV Occupancy Dynamics

The VMV dynamics simulates the decoded VOP memory allocation and de-allocation process. As each VOP is being processed, the decoder needs to allocate memory to store the decoded data. This data remains in the decoder memory until it is no longer needed, e.g. for prediction. At this point in time, the memory allocated to store this data is instantaneously released and can be used again.

A.1.3.3 VMV Model Constraints

A given set of visual ESs building a scene conforms with a given profile@level, with respect to the VMV model, if it never overflows the VMV model buffer.

A.1.4 Interaction between the VBV, VCV, and VMV Models

A given set of ESs building a visual scene is considered compliant with a given profile and level if it fulfills all the constraints defined by the several Video Buffering Verifier models. Bitstream compliance with a given profile@level guarantees that the resources required at the decoder do not exceed a certain pre-defined amount corresponding to the relevant profile@level. Moreover it defines strict timing for completion of decoding and composition of VOPs as explained in the following:

  1. The VBV model defines the time at which the coded bits for each VOP are available for decoding and the time at which they should be removed from the VBV buffer - the coded bits for each VOP should be removed from the VBV buffer at the VOP decoding times, ti, computed from the composition time information in the video ES or conveyed by systems decoding time stamps.
  2. The VCV model defines the decoding speed of the MB data, and, thus, the time at which each VOP is available for composition - a given VOP should be available for composition, at most, at the VOP composition time plus the VCV latency, i.e. at the time it is supposed to be available to the compositor.
  3. The VMV model defines the amount of picture memory allocated at each time instant and the time it should be released - a given VOP should be removed from the VMV buffer at its composition time plus the VCV latency (B-VOP) or at the composition time plus the VCV latency of the next P or I VOP (I or P-VOPs).

The various models are independent but interact with each other in the following way:

In order to avoid these situations, the Video Buffering Verifier mechanism imposes strict times for starting and ending any VOP decoding - constraint imposed by the VCV model.

The Video Buffering Verifier models provide the mechanism allowing any encoder to produce bitstreams that will be decodable by any decoder compliant with the selected profile@level. This mechanism allows to simultaneously limit the amount of decoding resources needed at the receiving terminals as well as ensure the timely reconstruction of the encoded information.

It is important to highlight that it is a major task of the encoder to simulate each of the Video Buffering Verifier models in order to produce bitstreams compliant with the intended profile and level. If any of these models tends to be violated, the encoder has to take appropriate countermeasures to avoid it. Although the Video Buffering Verifier is defined for the decoders, it is in fact a major module of any encoder generating compliant sets of bitstreams.

A.2 Definition of Levels for Video Profiles

Table A.1 describes the MPEG-4 Visual levels for the Version 1 and Version 2 profiles only including natural visual (or video) data, this means the so-called MPEG-4 video profiles. Note that Level 0 for the Simple profile has been defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].

Table A.1 Levels for the MPEG-4 video profiles

Visual profile

Level

Typical visual session size

Max. number of objects 1

Maximum number objects per type

Max. unique quant. tables

Max. VMV buffer size
(MB units)2

Max. VCV buffer size (MB)8

VCV decoder rate (MB/s) 4

VCV boundary MB
decoder rate (MB/s)9

Max. total VBV buffer size
(units of 16384 bits)5

Max. VOL VBV buffer size
(units of 16384 bits)

Max. video packet length (bits)6

Max. sprite size (MB units)

Wavelet restric­tions

Max. bitrate (kbit/s)

Max. enhancement layers
per object

Simple10

L0

QCIF

1

1 x Simple

1

198

99

1485

N.A.

10

10

2048

N. A.

N. A.

64

N. A.

Simple

L1

QCIF

4

4 x Simple

1

198

99

1485

N.A.

10

10

2048

N. A.

N. A.

64

N. A.

Simple

L2

CIF

4

4 x Simple

1

792

396

5940

N. A.

40

40

4096

N. A.

N. A.

128

N. A.

Simple

L3

CIF

4

4 x Simple

1

792

396

11880

N. A.

40

40

8192

N. A.

N. A.

384

N. A.

Advanced Real Time Simple

L1

QCIF

4

4 x Simple or Adv. Real Time Simple

1

198

99

1485

N.A.

10

10

8192

N. A.

N. A.

64

N. A.

Advanced Real Time Simple

L2

CIF

4

4 x Simple or Adv. Real Time Simple

1

792

396

5940

N. A.

40

40

16384

N. A.

N. A.

128

N. A.

Advanced Real Time Simple

L3

CIF

4

4 x Simple or Adv. Real Time Simple

1

792

396

11880

N. A.

40

40

16384

N. A.

N. A.

384

N. A.

Advanced Real Time Simple

L4

CIF

16

16 x Simple or Adv. Real Time Simple

1

792

396

11880

N. A.

80

80

16384

N. A.

N. A.

2000

N. A.

Simple Scalable

L1

CIF

4

4 x Simple or Simple Scalable

1

1782

495

7425

N. A.

40

40

2048

N. A.

N. A.

128

1 spatial or temporal enhancement layer

Simple Scalable3

L2

CIF

4

4 x Simple or Simple Scalable

1

3168

792

23760

N.A.

40

40

4096

N. A.

N. A.

256

1 spatial or temporal enhancement layer

Core

L1

QCIF

4

4 x Core or Simple

4

594

198

5940

2970

16

16

4096

N. A.

N. A.

384

1

Core

L2

CIF

16

16 x Core or Simple

4

2376

792

23760

11880

80

80

8192

N. A.

N. A.

2000

1

Advanced Core

L1

QCIF

4

4 x Core or Simple or Adv. Scalable Texture

4

594

198

5940

2970

16

8

4096

N. A.

see Table A.5

384

1

Advanced Core

L2

CIF

16

16 x Core or Simple or Adv. scalable Texture

4

2376

792

23760

11880

80

40

8192

N. A.

See Table A.5

2000

1

Core Scalable

L1

CIF

4

4 x Core or Simple or Core scalable or Simple Scalable

4

2376

792

14850

7425

64

64

4096

N.A.

N.A.

768

1

Core Scalable

L2

CIF

8

8 x Core or Simple or Core scalable or Simple

4

2970

990

29700

14850

80

80

4096

N.A.

N.A.

1500

1

Core Scalable

L3

CCIR601

16

16 x Core or Simple or Core scalable or Simple Scalable

4

12906

4032

120960

60480

80

80

16384

N. A.

N. A.

4000

2

Main

L2

CIF

16

16 x Main or Core or Simple

4

3960

1188

23760

11880

80

80

8192

1584

Scalable Texture Profile@L1

 2000

1

Main

L3

CCIR 601

32

32 x Main or Core or Simple

4

11304

3240

97200

48600

320

320

16384

6480

Scalable Texture Profile@L1

15000

1

Main

L4

1920 x 1088

32

32 x Main or Core or Simple

4

65344

16320

489600

244800

760

760

16384

65280

Scalable Texture Profile@L2

38400

1

Advanced Coding Efficiency

L1

CIF

4

4 x Adv. Coding Efficiency or Core or Simple

4

1188

792

11880

5940

40

40

8192

N. A.

N. A.

384

1

Advanced Coding Efficiency

L2

CIF

16

16 x Adv. Coding Efficiency or Core or Simple

4

2376

1188

23760

11880

80

80

8192

N. A.

N. A.

2000

1

Advanced Coding Efficiency

L3

CCIR 601

32

32 x Adv. Coding Efficiency or Core or Simple

4

9720

3240

97200

48600

320

320

16384

N. A.

N. A.

15000

1

Advanced Coding Efficiency

L4

1920 x 1088

32

32 x Adv. Coding Efficiency or Core or Simple

4

48960

16320

489600

244800

760

760

16384

N. A.

N. A.

38400

1

N-Bit

L2

CIF

16

16 x Core or Simple or N-Bit

4

2376

792

23760

11880

80

80

8192

N. A.7

N. A.

2000

1

Notes:
  1. Enhancement layers are not counted as separate objects.
  2. The maximum VMV (Video Memory Verifier) buffer size is the bound on the memory (in macroblock units) which can be used by the VMV algorithm. This algorithm (see [MPEG4-2; subclause D.5]) models the pixel memory needed by the entire visual decoding process. This includes the memory needed for reference VOPs in the prediction of P, B, and S(GMC)-VOPs and the storage of the reconstructed VOPs until the time they are released by the decoder, plus the memory required to queue B-VOPs until composition occurs. For the profiles that contain more than one layer, the memory requirements include all base and enhancement layers. When belonging to different, overlapping objects, some of these macroblocks may overlay on the display; however separate memory is required (prior to composition) in the VMV.
  3. The conformance point for the base layer of the Simple Scalable Visual profile is the Simple Profile@L1 when Simple Scalable Profile@L1 is used and the Simple Profile@L2 when Simple Scalable Profile@L2 is used.
  4. The VCV (Video Complexity Verifier) decoder rate is the vcv_decoder_rate (H) referred in [MPEG4-2; subclause D.4]; this parameter is the number of macroblocks/second based on the typical spatial and temporal resolutions, as follows:
    • 1485 MBs/s corresponds to QCIF at 15Hz
    • 5940 MBs/s corresponds to CIF at 15 Hz and also twice QCIF at 30 Hz
    • 11880 MB/s corresponds to CIF at 30 Hz
    • 7425 MB/s corresponds to 1.25 times CIF at 15 Hz
    • 23760 MB/s corresponds to twice CIF at 30 Hz
    • 97200 MB/s corresponds to twice ITU-R 601 at 30 Hz
    • 489600 MB/s corresponds to twice 1920x1088 at 30 Hz
  5. The total (aggregated) vbv_buffer_size is the sum of the individual VBV buffer occupancies at any given time (in units of 16384 bits) for all VOLs of all VOs. This total VBV size is limited according to the profile and level.
  6. The maximum video packet length is defined as the maximum number of bits of data_partitioned_motion_shape_texture() in one video packet. The constraint applies only when the data-partitioning tool is enabled in the bitstream. When data partitioning is disabled, there is no limit on the size of video packet length.
  7. N. A. means Not Applicable.
  8. The maximum VCV buffer size (cumulative over all layers of all VOs) is twice the maximum number of macroblocks per VOP in the profile and level combination except for the Simple Visual Profile, Simple Scalable profile (Level 1) and Advanced Real Time Simple Profile. For the Simple Visual Profile and the Advanced Real Time Simple Profile, this value is the maximum number of macroblocks per VOP. For the Simple Scalable profile (Level 1), it is 1.25 times the maximum number of macroblocks per VOP. The limit applies to both the VCV buffer and the boundary MB VCV buffer.
  9. The VCV boundary MB decoder rate column bounds the number of macroblocks containing non trivial shape information (boundary, not transparent nor opaque). The VCV boundary MB decoder rate constrains the total number of boundary MBs in all VOLs, concurrently. Note that the boundary macroblocks are added to both the VCV and boundary MB VCV buffers.
  10. For the Simple Profile@Level 0, the following restrictions apply:
    • The maximum frame rate shall be 15 frames per second;
    • The maximum f_code shall be 1;
    • The intra_dc_vlc_threshold shall be 0;
    • The maximum horizontal luminance pixel resolution shall be 176 pels/line;
    • The maximum vertical luminance pixel resolution shall be 144 pels/VOP;
    • If AC prediction is used, the following restriction applies : QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing QP value.

Table A.2 describes the MPEG-4 Visual levels for the Studio profiles defined in the 1st Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01a].

Table A.2 Levels for the Studio profiles

Visual profile

Level

Typical visual session formats1

Max. pixel depth

Max. number of objects

Max. number per type

Max. VMV buffer size (sample)2

Max. VCV buffer size (sample)3

VCV decoder rate (sample /s)

VCV boundary MB decoder rate (sample /s)

Max total VBV buffer size

Max VOL VBV buffer size

Max. video packet length (bits)

Max sprite size (sample)4

Wavelet restric­tions

Max bitrate (Mbit/s)

Max. enhancement layers per object

Simple Studio

L1

ITU-R601:4224

ITU-R601:444

10

1

1 x Simple Studio

1313280

1313280

33177600

33177600

576

576

N.A.

N.A.

N.A.

180

N.A.

Simple Studio

L2

ITU-R709.60I:422

ITU-R601:444444

10

1

1 x Simple Studio

4194304

4194304

125,829120

125,829120

1920

1920

N.A.

N.A.

N.A.

600

N.A.

Simple Studio

L3

ITU-R709. 60I:444

ITU-R709. 60I:4224

12

1

1 x Simple Studio

6291456

6291456

188,743680

188,743680

2880

2880

N.A.

N.A.

N.A.

900

N.A.

Simple Studio

L4

ITU-R709. 60P:444

ITU-R709. 60I:444444

2Kx2Kx30P:444

12

1

1 x Simple Studio

12582912

12582912

377487360

377487360

4320

4320

N.A.

N.A.

N.A.

1800

N.A.

Core Studio

L1

ITU-R601:4224

ITU-R601:444

10

4

4 x Core Studio or Simple Studio

5253120

2626560

66355200

66355200

576

576

N.A.

8294400

N.A.

90

N.A.

Core Studio

L2

ITU-R709.60I:422

ITU-R601:444444

10

4

4 x Core Studio or Simple Studio

16777216

8388608

251658240

251658240

1920

1920

N.A.

50135040

N.A.

300

N.A.

Core Studio

L3

ITU-R709. 60I:444

ITU-R709. 60I:4224

10

8

8 x Core Studio or Simple Studio

25165824

12582912

377487360

377487360

2880

2880

N.A.

75202560

N.A.

450

N.A.

Core Studio

L4

ITU-R709. 60P:444

ITU-R709. 60I:444444

2Kx2Kx30P:444

10

16

16 x Core Studio or Simple Studio

50331648

25165824

754974720

754974720

4320

4320

N.A.

150994944

N.A.

900

N.A.

Notes:

  1. ITU-R 709 is ITU-R BT. 709 and ITU-R 601 is ITU-R BT. 601; 444444 means 444(RGB) + 3 auxiliary channels; 4224 means 422(YUV)+ 1 auxiliary channel
  2. VMV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VMV also includes auxiliary channel samples.
  3. VCV is defined by the number of samples which belong to the bounding box of texture regardless shape information. VCV also includes auxiliary channel samples.
  4. Maximum sprite size is defined by the number of samples for sprite memory.

Table A.3 describes the MPEG-4 Visual levels for the Advanced Simple and Fine Granularity Scalable profiles defined in the 2nd Extension to the 2nd Edition of the MPEG-4 Visual standard [MPEG01b].

Table A.3 Levels for the Advanced Simple and Fine Granularity Scalable (FGS) profiles

Visual profile

Level

Typical visual session size

Max. number of objects

Max. number
per type

Max. unique quant. tables

Max. VMV buffer size (MB units)

Max. VCV buffer size (MB)

VCV decoder rate (MB/s)

Max. percentage of intra MBs with AC prediction in VCV buffer

Max total VBV buffer size
(units of 16384 bits)

Max. VOL VBV buffer size (units of 16384 bits)

Max. video packet length
(bits)

Maximum bitrate (kbit/s) 2

Maximum number of coded VOP-bps 3

Adv. Sim.

L0

176x144

1

1x AS or Simple

1

297

99

2970

100

10

10

2048

128

N.A.

Adv.Sim.

L1

176x144

4

4x AS or Simple

1

297

99

2970

100

10

10

2048

128

N.A.

Adv.Sim.

L2

352x288

4

4x AS or Simple

1

1188

396

5940

100

40

40

4096

384

N.A.

Adv.Sim.

L3

352x288

4

4x AS or Simple

1

1188

396

11880

100

40

40

4096

768

N.A.

Adv.Sim.

L4

352x576

4

4x AS or Simple

1

2376

792

23760

50

80

80

8192

3000

N.A.

Adv.Sim.

L5

720x576

4

4x AS or Simple

1

4860

1620

48600

25

112

112

16384

8000

N.A.

FGS

L0

176x144

1

1x AS or FGS or Simple

1

297

99

2970

100

10

10

2048

128

4

FGS

L1

176x144

4

4x AS or FGS or Simple

1

297

99

2970

100

10

10

2048

128

4

FGS

L2

352x288

4

4x AS or Simple

1

1188

396

5940

100

40

40

4096

384

4

FGS

L3

352x288

4

4x AS or FGS or Simple

1

1188

396

11880

100

40

40

4096

768

4

FGS

L4

352x576

4

4x AS or FGS or Simple

1

2376

792

23760

50

80

80

8192

3000

4

FGS

L5

720x576

4

4x AS or FGS or Simple

1

4860

1620

48600

25

112

112

16384

8000

4

Notes:

  1. The following restriction applies to Level 0 of Advanced Simple profile and FGS profile: if AC prediction is used, the QP value shall not be changed within a VOP (or within a video packet if video packets are used in a VOP). If AC prediction is not used, there are no restrictions to changing the QP value.
  2. For the FGS profile, this column is the maximum base-layer bitrate.
  3. The maximum number of coded VOP-bps takes into consideration the shifted bits after applying frequency weighting and/or selective enhancement.
  4. The number of FGS, FGST, or FGS-FGST layers is always one. If the FGS layer and the FGST layer are separated, the number of total enhancement layers is two.
  5. The interlace tools are not used for levels L0, L1, L2, and L3 of the Advanced Simple and FGS profiles.
  6. It is inherent in the FGS profile that the base and enhancement layers are tightly coupled to each other. To avoid unnecessary memory storage, the following constraints apply to the decoding time relationship of the enhancement layer and the base layer:
    • Decoding and composition (or presentation in a no-compositor decoder) of each FGS or FGST VOP shall be performed in the same time unit.
    • Decoding of each FGS and FGST VOP shall be performed immediately after the reference base layer VOP(s) are decoded without violating the above constraint.

A.3 Definition of Levels for Synthetic Profiles

This section describes the MPEG-4 visual profiles including only synthetic visual data. Note that the profiles including only texture object types are here considered synthetic profiles since the video texture coding tool was developed by the MPEG SNHC subgroup with the initial target to code textures to map over 2D and 3D models.

A.3.1 Scalable Texture Profile

This profile includes three levels defined in Table A.4.

Table A.4 Levels for the Scalable Texture profile

Profile

Level

Default wavelet filter

Max. download
filter length

Max. nº of  decomposition levels

Typical visual session size1

Max. Qp value

Max. nº of pixels/
session

VCV decoder rate (equivalent MB/s)2

Max. nº of bitplanes for DC values

Max. VCV buffer size (equivalent MB)2

Scalable Texture

L3

Float, Integer

ON, 15

10

8192

x8192

12 bits

67108864

262144

18

262144

Scalable Texture

L2

Integer

ON, 15

8

2048

x2048

10 bits

4194304

16384

16

16384

Scalable Texture

L1

Integer

OFF

5

704

x576

8 bits

405504

1584

13

1584

Notes:

  1. This column is for informative use only. It provides an example configuration of the maximum number of pixels per session.
  2. This still texture VCV model is separate from the global video VCV model. An equivalent MB corresponds to 256 pixels.

A.3.2 Simple Face Animation Profile

All MPEG-4 facial animation decoders (for all object types) are required to generate at their output a facial model including all the feature points defined in MPEG-4 Visual even if some of the features points will not be affected by any information received from the encoder.

The Simple Face Animation object type is not required to implement the viseme_def/expression_def functionality [MPEG4-2]. The Simple Face Animation profile has two levels:

Level 1

  1. Number of objects: 1
  2. The total FAP decode frame rate in the bitstream shall not exceed 72 Hz
  3. The decoder shall be capable of a face model rendering update of at least 15 Hz
  4. Maximum bitrate: 16 kbit/s
Level 2
  1. Maximum number of objects: 4
  2. The FAP decode frame rate in the bitstream shall not exceed 72 Hz (this means that the FAP decode frame rate is to be shared among the objects)
  3. The decoder shall be capable of rendering the face models with the update rate of at least 60 Hz, sharable between faces, with the constraint that the update rate for each individual face is not required to exceed 30 Hz
  4. Maximum bitrate: 32 kbit/s

A.3.3 Simple FBA Profile

All MPEG-4 Visual FBA decoders (for all object types) are required to generate at their output a humanoid model including all the feature points and joints defined in MPEG-4 Visual [MPEG4-2], even if some of the features points and joints will not be affected by any information received from the encoder. The Simple FBA profile has two levels:

Level 1

  1. Number of objects: 1
  2. The total FBA decode frame rate in the bitstream shall not exceed 72 Hz
  3. The decoder shall be capable of a humanoid model rendering update of, at least, 15 Hz
  4. Maximum bitrate: 32 kbit/s
  5. The decoder is not required to animate Spine3, Spine4 and Spine5 BAP groups [MPEG4-2]
Level 2
  1. Maximum number of objects: 4
  2. The FBA decode frame rate in the bitstream shall not exceed 72 Hz (this means that the FBA decode frame rate is to be shared among the objects)
  3. The decoder shall be capable of rendering the humanoid models with the update rate of, at least, 60 Hz, sharable between humanoids, with the constraint that the update rate for each individual humanoid is not required to exceed 30 Hz
  4. Maximum bitrate: 64 kbit/s

A.3.4 Advanced Core and Advanced Scalable Texture Profiles

The levels for the Advanced Core and Advanced Scalable Texture profiles are defined in Table A.5. Notice that Advanced Core includes video as well as still picture object types and thus has two types of level constraints (see Table A.1 and Table A.5).

Table A.5 Levels for the Advanced Core and Advanced Scalable Texture profiles

Visual profile

Level

Default wavelet filter

Max. download filter length

Max. nº of  decomposition levels

Typical visual session size1

Max. Qp value (bits)

Max. nº of pixels / session2

VCV decoder rate (equivalent MB/s)3

Max. nº of bitplains for DC values

Max. VCV buffer size (equivalent MB)

Max. STO packet length (bits)

Max. nº of pixels / tile

Max. nº of tiles

Advanced Core

L2

Integer

ON, 15

8

8192x8192

10

67108864

262144

16

262144

8192

262144

2048

Advanced Core

L1

Integer

OFF

5

2048x2048

8

4194304

16384

13

16384

4096

65536

1024

Advanced Scalable Texture

L3

Float, Integer

ON, 15

10

8192x8192

12

67108864

262144

18

262144

8192

67108864

4096

Advanced Scalable Texture

L2

Integer

ON, 15

8

2048x2048

10

4194304

16384

16

16384

4096

4194304

2048

Advanced Scalable Texture

L1

Integer

OFF

5

704x576

8

405504

1584

13

1584

2048

405504

(4xCIF)

1024

Notes:

  1. This column is for informative use only. It provides an example configuration of the maximum number of pixels per session.
  2. When the number of pixels per session is larger than the maximum number of pixels per tile, tiling_disable shall be 0.
  3. This still texture VCV model is separate from the global video VCV model. An equivalent MB corresponds to 256 pixels.

A.4 Definition of Levels for Synthetic and Natural Hybrid Profiles

The levels for the Profiles supporting both video object types and synthetic visual object types are specified by giving bounds for the natural objects and for the synthetic objects. Parameters like bitrate can be combined across natural and synthetic objects.

A.4.1 Basic Animated Texture Profile

The Basic Animated Texture profile has two levels defined as:

Level 1

Equivalent to the Simple Facial Animation Profile@Level 1 + Scalable Texture@Level 1 + the following restrictions on Basic Animated Texture object types:

  1. Maximum number of mesh objects (with uniform topology): 4
  2. Maximum total number of nodes (vertices) in mesh objects: 480 ( equivalent to 4 x number of nodes of a uniform mesh covering a QCIF image with 16x16 pixel elements)
  3. Maximum frame rate of a mesh object: 30 Hz
  4. Maximum bitrate of mesh objects: 64 kbit/s

Level 2

Equivalent to the Simple Facial Animation Profile@Level 2 + Scalable Texture@Level 2 + the following restrictions on Basic Animated Texture object types:

  1. Maximum number of Mesh objects (with uniform topology): 8
  2. Maximum total number of nodes (vertices) in Mesh objects: 1748 (equivalent to 4 x number of nodes of a uniform mesh covering a CIF image with 16x16 pixel elements)
  3. Maximum frame-rate of a Mesh object: 60 Hz
  4. Maximum bitrate of Mesh objects: 128 kbit/s

A.4.2 Hybrid Profile

The Hybrid profile has two levels defined as:

Level 1

Equivalent to the Core Visual Profile@Level 1 + Basic Animated Texture Profile@Level 1 + the following restrictions on Animated 2D Mesh object types:

  1. Maximum number of mesh objects (with uniform or Delaunay topology): 4 (the same as the maximum number of objects in visual session)
  2. Maximum total number of nodes (vertices) in mesh objects: 480 (equivalent to 4 x number of nodes of a uniform mesh covering a QCIF image with 16x16 pixel elements)
  3. Maximum frame-rate of a mesh object: 30 Hz (the same as the maximum frame rate of video object)
  4. Maximum bitrate of mesh objects: 64 kbit/s

Level 2

Equivalent to the Core Visual Profile@Level 2 + Basic Animated Texture Profile@Level 2 + the following restrictions on Animated 2D Mesh object types:

  1. Maximum number of mesh objects(with uniform or Delaunay topology): 8 (the same as the maximum number of objects in visual session)
  2. Maximum total number of nodes (vertices) in mesh objects: 1748 (equivalent to 4 x number of nodes of a uniform mesh covering a CIF image with 16x16 pixel elements)
  3. Maximum frame rate of a mesh object: 60 Hz (2 x the maximum frame rate of video object)
  4. Maximum bitrate of mesh objects: 128 kbit/s

References

[MPEG01a] MPEG, “Studio Profiles”, Final Draft Amendment, Doc. ISO/MPEG N3898, Pisa MPEG Meeting, January 2001
[MPEG01b] MPEG, “Streaming Video Profiles”, Final Draft Amendment, Doc. ISO/MPEG N3904, Pisa MPEG Meeting, January 2001
[MPEG1-2] ISO/IEC 11172-2:1993, “Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1,5 Mbit/s - Part 2: Video”, 1993
[MPEG2-2] ISO/IEC 13818-2:2000, “Generic Coding of Moving Pictures and Associated Audio In-formation – Part 2: Video”, 2000
[MPEG4-1] ISO/IEC 14496-1:2001, “Coding of Audio-Visual Objects - Part 1: Systems”, 2nd Edition, 2001
[MPEG4-2] ISO/IEC 14496-2:2001, “Coding of Audio-Visual Objects - Part 2: Visual”, 2nd Edition, 2001
[MPEG4-4] ISO/IEC 14496-4:2001, “Coding of Audio-Visual Objects - Part 4: Conformance Testing”, 2nd Edition, 2001
[Nunes] Nunes, P., “Rate Control for Object-Based Video Coding Architectures”, Ph.D. Thesis, Instituto Superior Técnico, Lisboa – Portugal, to be submitted


[1] For an arbitrarily shaped video object, three types of MBs may exist: transparent, opaque, and boundary.
[2] Except for the short video header case.
[3] For basic sprites, the vbv_occupancy field specifies the initial VBV occupancy before decoding the first S-VOP in the elementary stream, i.e. not the very first VOP in a basic sprite, which must be an I-VOP, but the subsequent VOP, i.e. an S-VOP. Low-latency sprites, which allow the transmission of large image sprites progressively (both spatially and in terms of quality), are treated as any other VOL.
[4] VOLij corresponds to VOLj of VOi.
[5] To avoid accumulating errors, the MPEG-4 Visual standard specifies that real-valued arithmetic should be used to compute vbvi.
[6] For sprites, a hypothetical number of MBs is defined for each S-VOP [MPEG4-2].
[7] The B-VCV is only defined for profiles supporting arbitrarily shaped video objects.

M4IF Webmaster