JPEG Compression

Lossy compression based on DCT for photographic images with quality control.

The Digital Photograph Problem: Why We Need JPEG

In the age of smartphones and high-resolution cameras, we take digital photographs for granted. We snap pictures, share them instantly, and store thousands of them on our devices. However, behind every crisp, vibrant photo lies a massive amount of raw data. A single, uncompressed, high-quality photograph from a modern camera can easily exceed 20 to 30 megabytes in size. Storing a vacation album of a few hundred photos would consume gigabytes of space, and trying to email even one of these files would be a slow and frustrating experience.

This data-heavy nature of digital images presented a significant barrier to the growth of digital photography and the internet. The solution that emerged and became the undisputed world standard is JPEG. is not just a file format; it is a sophisticated method of specifically designed for continuous-tone images like photographs. Its genius lies in its ability to dramatically reduce file sizes, often by a factor of 10 or more, while keeping the visible loss in quality to an absolute minimum. It achieves this by cleverly exploiting the known limitations and characteristics of the human visual system.

The JPEG Compression Pipeline: A Step-by-Step Journey

The JPEG compression process is best understood as a multi-stage pipeline, where the image data is progressively transformed, reduced, and packed. Each step plays a crucial role in the final compression ratio and quality. Let us walk through the journey of an image as it gets compressed into a JPEG file.

Step 1: Color Space Transformation

Most digital images are captured and initially stored in the RGB (Red, Green, Blue) color space, where each pixel's color is defined by the intensity of these three primary colors. While this is intuitive for displays, it is not ideal for compression. The first step in the JPEG process is to convert the image from RGB to a different color space, typically YCbCr.

This new space separates the image information into three components:

Y (Luminance): This component represents the brightness or grayscale information of the image. It is what you would see if you were looking at a black-and-white version of the photograph.
Cb (Chrominance-blue): This component represents the blue-yellow color difference.
Cr (Chrominance-red): This component represents the red-green color difference.

Why do this? This separation is critical because the human visual system is far more sensitive to changes in brightness (luma) than it is to subtle variations in color (chroma). We can perceive fine details in grayscale with high acuity, but our perception of color is much less precise. By isolating the brightness from the color, the JPEG algorithm can treat them differently in subsequent steps, a key aspect of its psycho-visual approach.

Step 2: Chroma Subsampling (Downsampling)

Now that the color information is separate from the brightness, the algorithm takes advantage of our visual system's weakness. is the process of reducing the resolution of the color (Cb and Cr) channels. In essence, instead of storing a unique color value for every single pixel, the algorithm stores a single color value to be shared across a small block of pixels.

This is often described using a three-part ratio, such as $4:4:4$ , $4:2:2$ , or $4:2:0$ . For a block of 4 horizontal pixels:

$4:4:4$ : No subsampling. Every one of the 4 pixels has its own distinct Y, Cb, and Cr value. This preserves the maximum color quality.
$4:2:2$ : Horizontal subsampling. Each pair of two horizontal pixels shares the same Cb and Cr value, while each has its own Y value. The color resolution is halved horizontally.
$4:2:0$ : This is the most common scheme for consumer video and JPEG. Both horizontal and vertical color resolution is halved. A block of $2 \times 2$ pixels shares a single Cb and Cr value.

This step provides a significant amount of compression before any other complex algorithms are even applied, and for most photographs, the change is perceptually invisible. It is a very effective and purely psycho-visual trick.

Step 3: Discrete Cosine Transform (DCT)

This is the mathematical core of the JPEG algorithm. After color transformation and subsampling, each channel (Y, Cb, and Cr) is divided into $8 \times 8$ blocks of pixels. Each of these blocks is then processed by the .

Instead of representing the block as 64 pixel values, the DCT represents it as 64 frequency coefficients. You can think of this as breaking down a complex visual pattern into a sum of simple, standard patterns (cosine waves of different frequencies). The output of the DCT is another $8 \times 8$ block, but its values now represent frequencies, not pixel colors:

The DC Coefficient: The top-left value in the block ( $F(0,0)$ ) is the DC (Direct Current) coefficient. It represents the average value of all 64 pixels in the original block, essentially its overall color and brightness. This is the most important piece of visual information.
The AC Coefficients: The remaining 63 coefficients are the AC (Alternating Current) coefficients. They represent the details, textures, and edges within the block. Coefficients near the top-left represent low-frequency changes (gentle gradients), while coefficients towards the bottom-right represent high-frequency changes (fine details and sharp edges).

The key insight is that for a typical $8 \times 8$ block from a photograph, most of the visual "energy" is concentrated in the DC coefficient and a few of the low-frequency AC coefficients. The high-frequency AC coefficients are often very close to zero. The DCT's job is to achieve this "energy compaction," which makes the next step possible.

JPEG quality explorer

Move the slider to see how aggressive quantisation shrinks the file and introduces familiar artifacts.

Quality setting: Q80

Approx. ratio

2.40:1

Estimated size

3.13 MB

Estimated PSNR

42.4 dB

Artifacts

Light blocking

Sample frameQ80

Simulation downsamples the scene, emphasises 8x8 block boundaries, and adds mild ringing to mimic JPEG artifacts.

Quality vs compression2.40:1

Curve shows the estimated compression ratio for a typical 12 MP photo saved as JPEG.

Step 4: Quantization: The Lossy Step

This is where the magic of lossy compression truly happens, and where information is permanently discarded. Each of the 64 coefficients in the DCT block is divided by a corresponding value from a 64-element , and the result is rounded to the nearest integer.

This table is not random; it is designed with psycho-visual principles in mind.

The value in the top-left of the quantization table (for the DC coefficient) is typically small, preserving the crucial average color information with high precision.
Values get progressively larger for higher-frequency coefficients. This means fine details (high-frequency AC coefficients) are divided by large numbers, which causes their resulting values to be rounded to zero or very small integers.

The "quality" setting when you save a JPEG file (e.g., from 1 to 100) is essentially a scaling factor for this quantization table. A high quality setting (like 95) uses smaller numbers in the table, preserving more detail and resulting in a larger file. A low quality setting (like 20) uses very large numbers in the table, discarding a lot of detail and producing a very small file. After this step, many of the 63 AC coefficients have become zero, especially for high frequencies.

Step 5: Entropy Coding: Lossless Tidying Up

After quantization, the block of coefficients is a sparse matrix, with lots of zeros. The final step is to arrange and encode this data efficiently using lossless methods.

Zig-Zag Scan: The 64 quantized coefficients are read out in a zig-zag pattern, starting from the DC coefficient in the top-left corner and moving towards the bottom-right. The purpose of this scan is to group the many zero-valued high-frequency coefficients into long, consecutive runs.
Run-Length Encoding (RLE): The long runs of zeros are then efficiently encoded. Instead of writing "0, 0, 0, 0, 0", RLE simply stores a code that means "five zeros."
Huffman Coding: Finally, the resulting stream of RLE codes and non-zero AC coefficients, along with the separately handled DC coefficients, is compressed using Huffman coding. This classic lossless algorithm assigns shorter binary codes to values that appear more frequently and longer codes to those that appear less frequently, achieving the final packing of the data into the JPEG bitstream.

The Journey Back: JPEG Decompression

To view a JPEG image, the device must perform the entire compression process in reverse. The decoder reads the compressed bitstream and:

Entropy Decoding: Decompresses the Huffman codes and reconstructs the RLE-encoded data to get back the zig-zag sequence of quantized DCT coefficients.
Dequantization: Re-orders the sequence back into an $8 \times 8$ block and multiplies each coefficient by the corresponding value from the same quantization table used during encoding. This restores the approximate magnitude of the DCT coefficients, but the information lost during rounding is gone forever.
Inverse Discrete Cosine Transform (IDCT): Applies the inverse DCT to the dequantized block of coefficients, converting the data from the frequency domain back into the spatial domain, resulting in an $8 \times 8$ block of pixel values.
Reconstruction: Reassembles all the $8 \times 8$ blocks, upsamples the color channels if chroma subsampling was used, and finally converts the image from YCbCr back to RGB so it can be displayed on a screen.

The result is an image that looks very close to the original, but is constructed from a fraction of the data.