H.265/HEVC Video Compression

High Efficiency Video Coding with improved compression over H.264.

The High-Resolution Challenge: Why H.264 Needed a Successor

H.264/AVC was a monumental achievement in video compression. For nearly a decade, it was the undisputed king, powering everything from Blu-ray Discs to the explosion of online video on platforms like YouTube. However, technology never stands still. As display technology advanced, the world began to move beyond Full HD (1920×10801920 \times 1080) and towards ultra-high-definition resolutions like 4K (3840×21603840 \times 2160) and even 8K.

This leap in resolution presented a massive data problem. A 4K video frame contains four times as many pixels as a 1080p frame. An 8K frame contains sixteen times as many. While H.264 could technically handle these resolutions, the required bitrates would be enormous, making it impractical for streaming over existing internet infrastructure or for broadcast. The digital world needed a new compression standard, one that could deliver the stunning clarity of 4K and beyond without requiring a quadrupling of bandwidth.

The Joint Collaborative Team on Video Coding (JCT-VC), a combined effort of the same ITU-T and ISO/IEC expert groups that created H.264, was formed to tackle this challenge. Their goal was to create a successor that could achieve the same perceptual quality as H.264 but at roughly half the bitrate. The result, standardized in 2013, was , also known as H.265.

The Core Innovation: From Rigid Macroblocks to Flexible Coding Trees

H.265's dramatic efficiency improvement comes from a fundamental shift in how it analyzes and partitions a video frame. Where H.264 was based on the relatively rigid concept of the 16×1616 \times 16 pixel macroblock, H.265 introduces a far more flexible and powerful structure called the . This is the single most important difference between the two standards.

The Coding Tree Unit (CTU): The New Foundation

Instead of fixed 16×1616 \times 16 blocks, the H.265 encoder works with much larger CTUs. The encoder can choose the CTU size, which can be 16×1616 \times 16, 32×3232 \times 32, or as large as 64×6464 \times 64 pixels. Larger block sizes are much more efficient for encoding large, flat or simple-textured areas common in high-resolution video, like a clear sky or a plain wall.

The true power of the CTU lies in its recursive, tree-like structure. The encoder can decide to code an entire 64×6464 \times 64 CTU as a single unit, or it can subdivide it into four smaller 32×3232 \times 32 units. Each of those can be further subdivided, and so on, all the way down to a minimum size of 4×44 \times 4. This structure is called a quadtree, and it allows the encoder to adapt the processing block size to the complexity of the image content with incredible precision.

The Branches of the Tree: CU, PU, and TU

Within the CTU, the encoder makes decisions on three different levels of partitioning, each serving a specific purpose:

  • Coding Units (CUs): The CU is the result of the main quadtree partitioning of the CTU. The encoder decides how far to subdivide based on image content. A flat, uniform region might be encoded as a single 64×6464 \times 64 CU, while a highly detailed area with complex textures might be broken down into many smaller CUs. The size of the CU dictates the granularity of the coding decisions that follow.
  • Prediction Units (PUs): Within each CU, the encoder must decide how to perform prediction (either intra- or inter-prediction). The CU is partitioned into one or more . For motion prediction, a large CU could be partitioned into asymmetric PUs (e.g., two rectangles) to better match the shape and movement of an object. This is more flexible than H.264's fixed sub-macroblock partitions.
  • Transform Units (TUs): After prediction, the residual (the difference between the original and the prediction) must be encoded. This is done using a transform similar to the DCT. The CU is partitioned into a quadtree of . HEVC allows for larger transform sizes than H.264 (up to 32×3232 \times 32 compared to 8×88 \times 8). Larger transforms are more effective at compacting the energy of the residual in smooth image regions, leading to better compression.

This CTU structure gives the H.265 encoder immense flexibility to spend its data budget wisely, using large, efficient blocks for simple areas and devoting more resources to small, complex details. This is the single biggest contributor to its improved compression efficiency.

Explore the interactive example below to compare rigid H.264 macroblocks with HEVC's adaptive CTU quadtree across different scene complexities.

CTU quadtree playground

Compare rigid 16×16 macroblocks with adaptive Coding Tree Units across real scene patterns.

Scene patterns

Large uniform area – the encoder keeps the CTU whole and only refines residual transforms slightly.

Inspect level
CTU 64×64
Coding Units (CU)
64×64

Encoder insight: Keeping the CU at 64×64 avoids signalling overhead. Only the transform tree splits so the encoder can quantise tiny gradients.

H.264 macroblock grid
16×161
16×162
16×163
16×164
16×165
16×166
16×167
16×168
16×169
16×1610
16×1611
16×1612
16×1613
16×1614
16×1615
16×1616

H.264 evaluates the entire 64×64 area as sixteen 16×16 macroblocks. Each block incurs headers even if the region is perfectly flat.

Partition impact
Average quadtree depth
1.2
CU count
1
Estimated bitrate saving vs. H.264
48%

Advanced Prediction and Coding Tools

Beyond the CTU structure, H.265/HEVC introduces a range of other enhancements that refine the prediction and coding processes, squeezing out even more redundancy.

  • 1. Enhanced Intra-Prediction

    For I-frames (and intra-coded blocks in P/B frames), H.264 offered 9 prediction modes. H.265 dramatically expands this to 35 modes for intra-prediction. This includes the old DC mode (averaging), a new Planar mode (creating a smooth surface gradient), and 33 angular modes. Having a much finer selection of prediction angles allows the encoder to more accurately predict directional structures like edges and textures, resulting in a much smaller residual to encode.

  • 2. Improved Motion Prediction: Merge and AMVP

    For P- and B-frames, H.265 improves on how motion vectors themselves are coded. It introduces two sophisticated modes for each Prediction Unit:

    • Merge Mode: The encoder builds a list of candidate motion vectors from neighboring spatial and temporal blocks. If one of these candidates provides a good match for the current block, the encoder can simply send a short index pointing to that candidate in the list. It does not need to send a new motion vector or a residual, resulting in extremely efficient coding for areas with consistent motion.
    • Advanced Motion Vector Prediction (AMVP): This is similar to Merge mode in that it uses neighboring vectors to predict the current motion vector. However, after using the predicted vector, it still refines the result and codes a small difference (residual) between the predicted vector and the final, optimal vector. This is used when motion is similar to neighbors but not identical.
  • 3. Sample Adaptive Offset (SAO)

    This is a new filter applied within the prediction loop, similar to the deblocking filter. SAO is designed to reduce distortion by classifying pixels into categories (e.g., based on edge direction or brightness level) and then adding a small offset value to all pixels in a given category. It acts as a final clean-up step, correcting for small, consistent errors introduced by the quantization process, thereby improving both perceptual quality and the accuracy of the frame as a reference for future predictions.

Architected for Parallelism: Tiles and Wavefronts

One of the major criticisms of H.264 was its highly sequential nature, which made it difficult to efficiently encode or decode on modern multi-core processors. H.265 was designed from the ground up to address this, introducing features specifically to enable massive parallel processing.

  • Tiles: The encoder can divide a picture into a grid of rectangular regions called tiles. Each tile can be encoded and decoded completely independently of the others. This is a powerful feature, as a video player on a computer with an 8-core CPU could assign each tile to a separate core and decode them all in parallel, dramatically speeding up playback. In-loop filters like the deblocking filter are constrained to operate only within tile boundaries.
  • Wavefront Parallel Processing (WPP): This is another, more granular form of parallelism. Encoding and decoding of rows of CTUs can proceed in parallel, with a slight delay. The first row of CTUs is processed normally. When the second row begins processing, it is allowed to use information from the CTUs in the row above it as context. The processing of the second row can start before the first row is fully finished, creating a "wavefront" of processing that moves down and across the picture. This allows for parallelization even when tiles are not used.

These features make H.265 far better suited to the hardware of today and tomorrow, enabling real-time encoding and decoding of 4K and 8K video on everything from high-end servers to energy-efficient mobile devices.

The interactive visualisation below shows how tile layouts and wavefront scheduling translate into real decoder parallelism.

Tiles & wavefront planner

See how HEVC unlocks multi-core decoding with tile slices and wavefront scheduling.

Technique
Tile layouts

Split the frame in half vertically: good balance for 2-core decoders.

Tile metrics
Tile count
2
Max parallel decoders
2
Largest tile (CTUs)
12

Encoder tip: Keeps reference sharing within each half while doubling decoder throughput.

T1R1C1
T1R1C2
T1R1C3
T2R1C4
T2R1C5
T2R1C6
T1R2C1
T1R2C2
T1R2C3
T2R2C4
T2R2C5
T2R2C6
T1R3C1
T1R3C2
T1R3C3
T2R3C4
T2R3C5
T2R3C6
T1R4C1
T1R4C2
T1R4C3
T2R4C4
T2R4C5
T2R4C6

The grid represents a 6×4 picture region composed of 64×64 CTUs.

Conclusion: The Price and Promise of Efficiency

H.265/HEVC successfully met its design goals. It delivers a remarkable improvement in compression efficiency, typically offering a 40-50% bitrate reduction compared to H.264 for the same perceptual video quality. This is the magic that makes 4K streaming a reality for millions of people. It allows broadcasters to transmit UHD channels, and enables us to store hours of high-resolution video on our devices.

However, this efficiency comes at a cost. The flexible partitioning and advanced prediction tools require the encoder to make vastly more decisions for every block. It has to test all possible CU, PU, and TU partitions, and all 35 intra-prediction modes, to find the most efficient combination. This makes H.265 encoding significantly more computationally expensive than H.264, sometimes requiring up to 10 times the processing power. Decoding is also more complex, though the difference is less dramatic thanks to the parallelism features.

H.265/HEVC represents a critical step in the evolution of video technology, providing the efficiency needed to usher in the era of Ultra High Definition video. While even newer standards like VVC (H.266) continue to push the boundaries, H.265 remains a dominant and essential codec for high-quality video delivery across the globe.

    H.265/HEVC Video Compression