Run-Length Encoding

Simple compression technique for data with many consecutive repeated values.

The Essence of Simplicity in Compression

Before diving into complex statistical models or adaptive dictionaries, it is essential to understand one of the oldest and most intuitive forms of lossless data compression: . RLE is a foundational algorithm whose core principle is incredibly simple: instead of listing every item in a repetitive sequence, just state what the item is and how many times it repeats.

This technique is a prime example of exploiting a very specific type of redundancy. It does not concern itself with the frequency of symbols across an entire file or with building a dictionary of phrases. Its sole focus is on finding contiguous blocks of identical data. While this makes it a specialized tool, its simplicity, speed, and effectiveness on the right kind of data have made it a valuable component in the history of computing, from early fax machines to modern image formats. Understanding RLE provides a solid foundation for appreciating the more complex compression methods that followed.

How Run-Length Encoding Works: The Core Principle

The mechanism behind RLE can be understood by anyone who has ever had to write down a long, repetitive list. Imagine you need to record the following sequence of colored tiles:

RED, RED, RED, RED, RED, BLUE, BLUE, BLUE, GREEN, GREEN, GREEN, GREEN, GREEN, GREEN, GREEN

Writing this out is tedious. A much more efficient, human-readable description would be:

5 RED, 3 BLUE, 7 GREEN

This is precisely what Run-Length Encoding does with digital data. The algorithm scans a stream of data, and when it encounters a "run," which is a sequence of two or more identical, consecutive data values, it replaces that run with two pieces of information:

The Count: The number of times the value is repeated in the run.
The Value: The data value itself that was repeated.

Interactive encoding walkthrough

Switch presets or type your own sequence, then drag the slider to watch each run collapse into a count + value pair.

Run-length encoding playground

Move the slider to see each run become a pair of count and value.

Preset sequences

A textbook sentence with three long runs and a short break.

Custom sequence

Type up to 64 symbols. Identical symbols in a row form one run.13/64

Construction stepStep 0 - raw input

Runs are processed from left to right.

Raw stream

Each cell shows one original symbol.

AAAAABBBCCDAA

Encoded pairs

RLE writes each run as a count followed by the symbol.

Move the slider to emit runs.

Current run

Highlight shows the run selected on the slider.

Move the slider to inspect a run.

Compression metrics

Original symbols

Runs detected

Encoded fields

Saved symbols

Compression ratio

1.30x

Track the number of runs and encoded fields to understand when RLE delivers real savings.

A Detailed RLE Example: Compressing a Simple Bitmap Image

RLE is particularly well-suited for simple bitmap images, especially those with limited color palettes, like icons, logos, or technical drawings. Let's walk through the compression of a small, 10x8 pixel monochrome image of an arrow.

Step 1: The Uncompressed Data

In a computer, an image is just a sequence of numbers, where each number represents the color of a pixel. For our simple black-and-white image, let's say 'W' represents a white pixel and 'B' represents a black pixel. The image data, when read from left to right, top to bottom, forms a single, linear stream of data:

WWWWWWWWWW WWWWBBWWWW WWWWBBWWWW WWWWBBWWWW BBBBWWBBBB WWWWBBWWWW WWWWBBWWWW WWWWBBWWWW

Uncompressed, this image requires storing 80 individual pixel values (10 pixels/row * 8 rows). If each pixel took 1 byte, that would be 80 bytes.

Step 2: The RLE Encoding Process

The RLE encoder scans this stream and identifies the runs of identical, consecutive values:

The first run is 10 white pixels. The encoder writes the pair (10, W).
The next run is 4 white pixels. It writes (4, W).
Next is a run of 2 black pixels. It writes (2, B).
Then, a run of 4 white pixels. It writes (4, W).
The process continues row by row until the end of the data stream.

Step 3: The Compressed Output and Result Analysis

The final RLE-encoded representation of our image would be the following sequence of (count, value) pairs:

(10, W), (4, W), (2, B), (4, W), (4, W), (2, B), (4, W), (4, W), (2, B), (4, W), (4, B), (2, W), (4, B), (4, W), (2, B), (4, W), (4, W), (2, B), (4, W), (4, W), (2, B), (4, W)

There seems to be a redundancy in the output itself (e.g., (4, W) repeats), which more advanced compressors could handle, but for RLE, this is the result. Let's analyze the size. The original data had 80 values. Our compressed output has 22 pairs, which means 44 values (22 counts + 22 pixel values). In this specific, slightly varied example, the compression is not as significant as in a perfectly uniform image. A more efficient RLE implementation might merge consecutive runs of the same type if possible or use special flags, but the basic principle holds. Let's consider a better RLE-suited example: the first line WWWWWWWWWW is encoded as (10,W). 10 symbols become 2.

The compression ratio would be approximately $80 / 44 \approx 1.82:1$ . This shows that even for a simple drawing, the savings are noticeable. For an image with even larger uniform areas, the ratio would be much higher.

The RLE Decompression Process: Simple and Fast

One of the major strengths of Run-Length Encoding is the simplicity and speed of decompression. The decompressor receives the sequence of (count, value) pairs. Its task is trivially simple: for each pair, it just writes out the 'value' character 'count' times.

Read the first pair: (10, W). Output: "WWWWWWWWWW".
Read the second pair: (4, W). Output: "WWWW".
Read the third pair: (2, B). Output: "BB".
...and so on, until the end of the compressed data is reached.

This process is computationally inexpensive and lightning-fast, as it involves no complex calculations, just simple loops. This is a significant advantage in applications where fast decoding is critical.

The Achilles' Heel of RLE: The Worst-Case Scenario

RLE is a highly specialized tool. Its extreme efficiency on data with long runs is matched by its extreme inefficiency on data with no runs at all. This worst-case scenario is sometimes referred to as a "compression catastrophe," where the compressed file becomes significantly larger than the original.

Example: Compressing "ABCDEFG"

Let's try to compress the string ABCDEFG using a basic RLE algorithm. The encoder scans the data:

It sees 'A'. The run of 'A's is 1. It must encode this as (1, A).
It sees 'B'. The run of 'B's is 1. It must encode this as (1, B).
...and so on for every character.

The "compressed" output becomes:

(1, A), (1, B), (1, C), (1, D), (1, E), (1, F), (1, G)

The original string had 7 characters. The output has 14 data items (7 counts + 7 values). The RLE algorithm has actually doubled the size of the data! This demonstrates that RLE should only be applied when we are reasonably sure that the data will contain enough repetition to offset this expansion.

Practical Implementations and Variations

To overcome the worst-case scenario, practical implementations of RLE are often more sophisticated than the basic model described above.

Using Control Flags

A common technique is to use a special control byte or flag. The encoder makes a decision for each segment of data:

If a run is found that is long enough to be worthwhile (e.g., 3 or more identical characters), it outputs a control flag indicating a "run is next," followed by the count and the value.
If the data is non-repetitive (like our "ABCDEFG" example), it outputs a different control flag indicating "raw data is next," followed by a count of the raw bytes, and then the raw bytes themselves.

This hybrid approach ensures that RLE is only applied when it's beneficial, preventing the data from expanding. Many file formats, including TIFF and PCX, use variants of this technique.

Real-World Applications

Despite its simplicity, RLE remains relevant today, both on its own and as a component in more complex systems.

Fax Machines: The international standards for fax transmission (CCITT Group 3 and Group 4) heavily rely on a modified form of RLE, as scanned documents typically consist of large areas of white and black.
Image File Formats: As mentioned, it is a compression option in formats like BMP (Windows Bitmap), PCX, and TIFF (Tagged Image File Format).
Intermediate Compression Stage: In some more advanced algorithms, RLE is used as an intermediate step. For example, the Burrows-Wheeler Transform (used in bzip2) is an algorithm that does not compress data but rearranges it so that identical characters are often clustered together. The output of this transform is an ideal candidate for a subsequent RLE compression stage.

Conclusion: A Fundamental Building Block

Run-Length Encoding is the embodiment of a simple idea executed well. It provides an excellent illustration of the core concept of compression: find a predictable pattern and describe it more efficiently than simply listing it out. While its scope is narrow and it is ill-suited for complex or random data, its blazing speed and high efficiency on the right kind of data ensure its place in the compression toolkit. For any student of computer science or telecommunications, understanding RLE is a vital first step toward grasping the more powerful and complex algorithms that govern our digital world.