Kolmogorov Complexity

Teoretyczne podstawy kompresji danych i algorytmicznej teorii informacji.

Beyond Probabilities: A New Perspective on Information

In our journey through the fundamentals of compression, we have extensively explored Claude Shannon's Information Theory. This powerful framework defines the limits of compression based on the statistical properties of a data source. Shannon's entropy tells us the average amount of information per symbol, which in turn sets the theoretical limit for how well we can compress a long message from that source. This approach is incredibly useful, but it has a key prerequisite: it deals with averages and requires a probabilistic model of the data source.

This raises a fascinating question: What if we do not have a probabilistic model? What if we are not interested in the average case, but in the specific complexity of a single, individual piece of data? For instance, how much information is contained in the first chapter of Moby Dick, not as a sample of English text, but as that specific, unique string of characters? How complex is a single digital photograph of a cat, independent of all other photographs? To answer this, we need a different, more absolute way of thinking about complexity. This leads us to the profound and elegant concept of Algorithmic Information Theory, and its central idea: Kolmogorov Complexity.

Defining

Developed independently in the 1960s by Andrey Kolmogorov, Ray Solomonoff, and Gregory Chaitin, the idea is deceptively simple yet powerful. It proposes that the true complexity of an object is the length of its shortest possible description. In the world of digital data, the ultimate form of description is a computer program.

Thus, the formal definition is as follows:

The Kolmogorov Complexity of a string

s

, denoted as

K(s)

, is the length of the shortest computer program

p

that, when run, outputs the string

s

and then halts.

Deconstructing the Definition

To fully grasp this concept, let us break down each part of the definition:

A String $s$ : In this context, a "string" can be any digital object. It can be a string of text like a sentence, a binary file representing an image, the sequence of bits that make up an executable program, or even the first trillion digits of pi. It is the object whose complexity we want to measure.
A Computer Program $p$ : This is a set of instructions written in a clearly defined, universal programming language. The "length" of the program is simply the number of characters or bits in its source code.
Outputs the String $s$ : When the program $p$ is executed, it must print the string $s$ and nothing else.
And Then Halts: This is a crucial condition. The program cannot run forever in a loop. It must complete its task and stop. This connects the concept to computability theory and the famous Halting Problem.
The Shortest Program: This is the most important part of the definition. There are infinitely many programs that could produce a given string. For example, a program could contain a large database and a lookup function, or it could compute the string algorithmically. We are only interested in the single, most concise program possible. The length of this minimal program is the string's Kolmogorov complexity.

A Note on the Choice of Language: The Invariance Theorem

One might wonder if the complexity of a string depends on the programming language chosen. A program written in Python might be shorter than an equivalent program in Assembly language. Does this mean the string is "simpler" in Python?

The addresses this. It states that for any two universal programming languages L1 and L2, the difference in the Kolmogorov complexity of any string $s$ is bounded by a constant that depends only on the languages themselves, not on the string. Essentially, one language can be compiled into another, and the compiler itself is a program of a fixed, constant length. So while the exact value of $K(s)$ might change slightly, its fundamental value remains stable regardless of the language. For this reason, we can typically assume a fixed, optimal "description language" for all theoretical purposes.

Illustrative Examples: The Simple, the Complex, and the Deceptive

The best way to understand Kolmogorov complexity is through examples that highlight the difference between apparent complexity and true, algorithmic complexity.

Example 1: The Simple, Repetitive String

Let's consider a highly structured, simple string, $s_1$ , which consists of the character 'A' repeated 1,000 times.

How could we generate this? A naive approach would be a program that literally contains the string:

print("AAAAAAAAAAAAAAAAA...") // 1000 'A's

The length of this program is over 1000 characters. It offers no compression. However, we can immediately see a more concise description. A much shorter program would be:

for i in range(1000): print('A')

This program is extremely short. Its length is constant and does not grow with the number of 'A's. The length of the shortest program is therefore very small. This tells us that the string $s_1$ has a very low Kolmogorov complexity, $K(s_1) \ll 1000$ . The string is simple because it has a short algorithmic description.

Example 2: The Incompressible, Random String

Now, consider a different string, $s_2$ , of the same length (1000 characters), but one generated by flipping a fair coin 1000 times and writing down the results (e.g., 'HTTHHTH...').

What is the shortest program that could produce this string? Since the string is genuinely random, it has no discernible patterns or rules. There is no simple loop or formula that can generate it. The most concise description we can find is simply to state the string itself:

print("HTTHHTH...") // the full random sequence

In this case, the shortest program has a length roughly equal to the length of the string itself. Therefore, the string $s_2$ has a very high Kolmogorov complexity, $K(s_2) \approx 1000$ . A string whose Kolmogorov complexity is close to its actual length is considered to be algorithmically random or incompressible. It contains no reducible structure.

Example 3: The Deceptively Simple String (Digits of Pi)

Let's consider a third string, $s_3$ , which contains the first one million digits of the mathematical constant Pi.

s_3 = 31415926535897932384...

(one million digits)

To the naked eye, this string of digits looks completely random. Statistically, the digits appear with roughly equal frequency, and there are no simple repeating patterns. A statistical measure like Shannon entropy would suggest this string has high information content, similar to a random string.

But what is its Kolmogorov complexity? We know that Pi is not random; it is a precisely defined mathematical constant. There exist algorithms, such as the Chudnovsky algorithm or the Bailey-Borwein-Plouffe formula, that can calculate the digits of Pi. We could write a relatively short computer program that implements one of these algorithms.

# A simplified Python-like example
def calculate_pi_digits(n):
# ... implementation of a Pi calculation algorithm ...
return digits
print(calculate_pi_digits(1000000))

This program, while complex in its mathematics, has a fixed, short length that is independent of the number of digits we want to calculate. Its length might be a few kilobytes at most. This is vastly shorter than the one million characters of the string itself. Therefore, despite its random appearance, the string $s_3$ has an extremely low Kolmogorov complexity. It is a highly structured, simple object from an algorithmic perspective.

Kolmogorov Complexity vs. Shannon Entropy

At this point, it is crucial to clearly contrast these two powerful measures of information.

Aspect	Shannon Entropy	Kolmogorov Complexity
Applies To	An entire source of data (an ensemble of messages). It is an average property.	A single, specific, individual string.
Basis of Measure	Based on the probability distribution of symbols from the source.	Based on the length of an algorithmic description (a program).
Randomness Concept	Measures statistical randomness (unpredictability of the next symbol).	Measures algorithmic randomness (incompressibility).
Computability	Computable. Given a model of the source, entropy can be calculated.	Uncomputable. It cannot be calculated by any algorithm.

The example of the digits of Pi perfectly highlights the difference. Shannon entropy sees the digits as statistically random. Kolmogorov complexity sees the underlying mathematical rule and recognizes the string as highly ordered. Kolmogorov complexity is a more powerful and fundamental measure, but it comes with a major catch.

The Uncomputability Problem: The Great Limitation

If Kolmogorov complexity is the ultimate measure of compressibility, why do we not simply build a compressor that finds the shortest program for any given file? The reason is that it is mathematically impossible. The Kolmogorov complexity of a string is .

This fact is deeply connected to the Halting Problem, one of the most important results in computer science, proven by Alan Turing. The Halting Problem states that it is impossible to create a general algorithm that can analyze any arbitrary computer program and its input and determine whether that program will eventually finish (halt) or run forever.

To find $K(s)$ , we would need to systematically search for the shortest program that produces $s$ . This would involve generating and testing every possible program, starting with the shortest ones. For each program, we would run it and see what it outputs. However, if a program runs for a minute, an hour, a year... we can never be certain if it is just taking a long time to compute the answer or if it is stuck in an infinite loop. Due to the Halting Problem, there is no way to automate this decision. We cannot reliably filter out the non-halting programs from our search. Therefore, we can never be certain we have found the shortest halting program.

Practical Significance: The Theoretical Gold Standard

The fact that Kolmogorov complexity is uncomputable does not render it a mere philosophical curiosity. It has profound practical significance.

Defining the Limit: It provides a theoretical "gold standard" for lossless compression. We know that the compressed size of a file can never be smaller than its Kolmogorov complexity. This gives us a firm boundary on what is achievable.
Guiding Algorithm Design: Practical compression algorithms like ZIP, PNG, and FLAC can be viewed as attempts to approximate the Kolmogorov complexity of a file. They do not search all possible programs, which is impossible. Instead, they use a limited, computable set of techniques (like finding repeated strings or statistical patterns) to find a short, algorithmic description. A "better" compression algorithm is one that, for a wider range of typical files, finds a shorter description, thus producing a smaller file size that is closer to the true (but unknown) Kolmogorov complexity.
Formalizing Randomness: It gives us a formal, rigorous definition of randomness. A string is algorithmically random if it is incompressible, meaning its shortest description is the string itself. This concept is fundamental in many areas of science and philosophy.

In conclusion, Kolmogorov Complexity shifts the focus from the statistical properties of data to its inherent algorithmic structure. It defines the true information content of an individual object as the size of the most elegant and concise set of instructions needed to create it. While we can never perfectly calculate this value, the pursuit of finding ever-shorter descriptions for our data remains the central goal of all lossless compression.