H.265/HEVC Video Compression

High Efficiency Video Coding with improved compression over H.264.

The High-Resolution Challenge: Why H.264 Needed a Successor

H.264/AVC was a monumental achievement in video compression. For nearly a decade, it was the undisputed king, powering everything from Blu-ray Discs to the explosion of online video on platforms like YouTube. However, technology never stands still. As display technology advanced, the world began to move beyond Full HD ( $1920 \times 1080$ ) and towards ultra-high-definition resolutions like 4K ( $3840 \times 2160$ ) and even 8K.

This leap in resolution presented a massive data problem. A 4K video frame contains four times as many pixels as a 1080p frame. An 8K frame contains sixteen times as many. While H.264 could technically handle these resolutions, the required bitrates would be enormous, making it impractical for streaming over existing internet infrastructure or for broadcast. The digital world needed a new compression standard, one that could deliver the stunning clarity of 4K and beyond without requiring a quadrupling of bandwidth.

The Joint Collaborative Team on Video Coding (JCT-VC), a combined effort of the same ITU-T and ISO/IEC expert groups that created H.264, was formed to tackle this challenge. Their goal was to create a successor that could achieve the same perceptual quality as H.264 but at roughly half the bitrate. The result, standardized in 2013, was , also known as H.265.

The Core Innovation: From Rigid Macroblocks to Flexible Coding Trees

H.265's dramatic efficiency improvement comes from a fundamental shift in how it analyzes and partitions a video frame. Where H.264 was based on the relatively rigid concept of the $16 \times 16$ pixel macroblock, H.265 introduces a far more flexible and powerful structure called the . This is the single most important difference between the two standards.

The Coding Tree Unit (CTU): The New Foundation

Instead of fixed $16 \times 16$ blocks, the H.265 encoder works with much larger CTUs. The encoder can choose the CTU size, which can be $16 \times 16$ , $32 \times 32$ , or as large as $64 \times 64$ pixels. Larger block sizes are much more efficient for encoding large, flat or simple-textured areas common in high-resolution video, like a clear sky or a plain wall.

The true power of the CTU lies in its recursive, tree-like structure. The encoder can decide to code an entire $64 \times 64$ CTU as a single unit, or it can subdivide it into four smaller $32 \times 32$ units. Each of those can be further subdivided, and so on, all the way down to a minimum size of $4 \times 4$ . This structure is called a quadtree, and it allows the encoder to adapt the processing block size to the complexity of the image content with incredible precision.

The Branches of the Tree: CU, PU, and TU

Within the CTU, the encoder makes decisions on three different levels of partitioning, each serving a specific purpose:

Coding Units (CUs): The CU is the result of the main quadtree partitioning of the CTU. The encoder decides how far to subdivide based on image content. A flat, uniform region might be encoded as a single $64 \times 64$ CU, while a highly detailed area with complex textures might be broken down into many smaller CUs. The size of the CU dictates the granularity of the coding decisions that follow.
Prediction Units (PUs): Within each CU, the encoder must decide how to perform prediction (either intra- or inter-prediction). The CU is partitioned into one or more . For motion prediction, a large CU could be partitioned into asymmetric PUs (e.g., two rectangles) to better match the shape and movement of an object. This is more flexible than H.264's fixed sub-macroblock partitions.
Transform Units (TUs): After prediction, the residual (the difference between the original and the prediction) must be encoded. This is done using a transform similar to the DCT. The CU is partitioned into a quadtree of . HEVC allows for larger transform sizes than H.264 (up to $32 \times 32$ compared to $8 \times 8$ ). Larger transforms are more effective at compacting the energy of the residual in smooth image regions, leading to better compression.

This CTU structure gives the H.265 encoder immense flexibility to spend its data budget wisely, using large, efficient blocks for simple areas and devoting more resources to small, complex details. This is the single biggest contributor to its improved compression efficiency.

Explore the interactive example below to compare rigid H.264 macroblocks with HEVC's adaptive CTU quadtree across different scene complexities.

CTU quadtree playground

Compare rigid 16×16 macroblocks with adaptive Coding Tree Units across real scene patterns.

Scene patterns

Large uniform area – the encoder keeps the CTU whole and only refines residual transforms slightly.

Inspect level

CTU 64×64

Coding Units (CU)

64×64

Encoder insight: Keeping the CU at 64×64 avoids signalling overhead. Only the transform tree splits so the encoder can quantise tiny gradients.

H.264 macroblock grid

16×16

H.264 evaluates the entire 64×64 area as sixteen 16×16 macroblocks. Each block incurs headers even if the region is perfectly flat.

Partition impact

Average quadtree depth

1.2

CU count

Estimated bitrate saving vs. H.264

48%

Advanced Prediction and Coding Tools

Beyond the CTU structure, H.265/HEVC introduces a range of other enhancements that refine the prediction and coding processes, squeezing out even more redundancy.

1. Enhanced Intra-Prediction
For I-frames (and intra-coded blocks in P/B frames), H.264 offered 9 prediction modes. H.265 dramatically expands this to 35 modes for intra-prediction. This includes the old DC mode (averaging), a new Planar mode (creating a smooth surface gradient), and 33 angular modes. Having a much finer selection of prediction angles allows the encoder to more accurately predict directional structures like edges and textures, resulting in a much smaller residual to encode.
2. Improved Motion Prediction: Merge and AMVP
For P- and B-frames, H.265 improves on how motion vectors themselves are coded. It introduces two sophisticated modes for each Prediction Unit:
- Merge Mode: The encoder builds a list of candidate motion vectors from neighboring spatial and temporal blocks. If one of these candidates provides a good match for the current block, the encoder can simply send a short index pointing to that candidate in the list. It does not need to send a new motion vector or a residual, resulting in extremely efficient coding for areas with consistent motion.
- Advanced Motion Vector Prediction (AMVP): This is similar to Merge mode in that it uses neighboring vectors to predict the current motion vector. However, after using the predicted vector, it still refines the result and codes a small difference (residual) between the predicted vector and the final, optimal vector. This is used when motion is similar to neighbors but not identical.
3. Sample Adaptive Offset (SAO)
This is a new filter applied within the prediction loop, similar to the deblocking filter. SAO is designed to reduce distortion by classifying pixels into categories (e.g., based on edge direction or brightness level) and then adding a small offset value to all pixels in a given category. It acts as a final clean-up step, correcting for small, consistent errors introduced by the quantization process, thereby improving both perceptual quality and the accuracy of the frame as a reference for future predictions.

Architected for Parallelism: Tiles and Wavefronts

One of the major criticisms of H.264 was its highly sequential nature, which made it difficult to efficiently encode or decode on modern multi-core processors. H.265 was designed from the ground up to address this, introducing features specifically to enable massive parallel processing.

Tiles: The encoder can divide a picture into a grid of rectangular regions called tiles. Each tile can be encoded and decoded completely independently of the others. This is a powerful feature, as a video player on a computer with an 8-core CPU could assign each tile to a separate core and decode them all in parallel, dramatically speeding up playback. In-loop filters like the deblocking filter are constrained to operate only within tile boundaries.
Wavefront Parallel Processing (WPP): This is another, more granular form of parallelism. Encoding and decoding of rows of CTUs can proceed in parallel, with a slight delay. The first row of CTUs is processed normally. When the second row begins processing, it is allowed to use information from the CTUs in the row above it as context. The processing of the second row can start before the first row is fully finished, creating a "wavefront" of processing that moves down and across the picture. This allows for parallelization even when tiles are not used.

These features make H.265 far better suited to the hardware of today and tomorrow, enabling real-time encoding and decoding of 4K and 8K video on everything from high-end servers to energy-efficient mobile devices.

The interactive visualisation below shows how tile layouts and wavefront scheduling translate into real decoder parallelism.

Tiles & wavefront planner

See how HEVC unlocks multi-core decoding with tile slices and wavefront scheduling.

Technique

Tile layouts

Split the frame in half vertically: good balance for 2-core decoders.

Tile metrics

Tile count

Max parallel decoders

Largest tile (CTUs)

Encoder tip: Keeps reference sharing within each half while doubling decoder throughput.

T1R1C1

T1R1C2

T1R1C3

T2R1C4

T2R1C5

T2R1C6

T1R2C1

T1R2C2

T1R2C3

T2R2C4

T2R2C5

T2R2C6

T1R3C1

T1R3C2

T1R3C3

T2R3C4

T2R3C5

T2R3C6

T1R4C1

T1R4C2

T1R4C3

T2R4C4

T2R4C5

T2R4C6

The grid represents a 6×4 picture region composed of 64×64 CTUs.

Conclusion: The Price and Promise of Efficiency

H.265/HEVC successfully met its design goals. It delivers a remarkable improvement in compression efficiency, typically offering a 40-50% bitrate reduction compared to H.264 for the same perceptual video quality. This is the magic that makes 4K streaming a reality for millions of people. It allows broadcasters to transmit UHD channels, and enables us to store hours of high-resolution video on our devices.

However, this efficiency comes at a cost. The flexible partitioning and advanced prediction tools require the encoder to make vastly more decisions for every block. It has to test all possible CU, PU, and TU partitions, and all 35 intra-prediction modes, to find the most efficient combination. This makes H.265 encoding significantly more computationally expensive than H.264, sometimes requiring up to 10 times the processing power. Decoding is also more complex, though the difference is less dramatic thanks to the parallelism features.

H.265/HEVC represents a critical step in the evolution of video technology, providing the efficiency needed to usher in the era of Ultra High Definition video. While even newer standards like VVC (H.266) continue to push the boundaries, H.265 remains a dominant and essential codec for high-quality video delivery across the globe.