HTTP/2

Binary framing, multiplexing, and server push in HTTP/2.

Why We Needed a New HTTP: The Limitations of HTTP/1.1

For nearly two decades, HTTP/1.1 was the workhorse protocol that powered the growth of the World Wide Web. It introduced persistent connections, allowing multiple requests and responses to be sent over a single TCP connection, which was a huge improvement over its predecessor, HTTP/1.0. However, as websites evolved from simple text documents into complex, resource-heavy applications with dozens of scripts, stylesheets, and images, the limitations of HTTP/1.1 became a significant performance bottleneck.

The single most critical problem in HTTP/1.1 is known as . Despite using a persistent connection, a browser could only send one request at a time on that connection and had to wait for the complete response before sending the next one. A single slow request, perhaps for a large image or a slow API endpoint, would block all other requests queued behind it, leaving the connection idle and delaying the rendering of the rest of the page.

Developers devised clever but complex workarounds to mitigate this, such as using multiple TCP connections per hostname (typically limited to 6 per browser), domain sharding, and concatenating files. These techniques added complexity and overhead. It became clear that the protocol itself needed a fundamental redesign to handle the demands of the modern web. This led to the development of HTTP/2, which was largely based on Google's experimental SPDY protocol and officially standardized in 2015.

Core Concept 1: The Binary Framing Layer

The most fundamental and transformative change introduced in HTTP/2 is the . HTTP/1.1 was a plaintext protocol. Its messages, including request lines, headers, and bodies, were sent as human-readable text separated by newlines. While this was great for debugging, it was inefficient for machines to parse and prone to errors.

HTTP/2 replaces this with a binary protocol. All communication is broken down into smaller, manageable, and machine-optimized units called frames. This shift from text to binary has several profound implications:

Efficiency: Binary formats are far more compact and faster for computers to parse than text-based formats, reducing processing overhead.
Robustness: Binary framing is less ambiguous and error-prone than parsing text with variable whitespace and line endings.
Enabling New Features: This structured, framed approach is the key that unlocks the most powerful features of HTTP/2, such as multiplexing, stream prioritization, and server push.

The Structure of an HTTP/2 Frame

Each frame in HTTP/2 has a well-defined structure. It consists of a fixed 9-byte header followed by a variable-length payload.

Length ( $24$ bits): An unsigned integer indicating the length of the frame payload in bytes.
Type ( $8$ bits): Defines the type of the frame. This determines how the payload should be interpreted.
Flags ( $8$ bits): A set of eight single-bit flags specific to the frame type, used to modify the frame's behavior.
Stream Identifier ( $31$ bits): A unique identifier for the stream to which this frame belongs. This is crucial for multiplexing.
Frame Payload (Variable): The actual content of the frame, whose structure is determined by the frame's Type field.

Common Frame Types

Some of the most important frame types include:

DATA: Carries the message body for a request or response.
HEADERS: Contains the HTTP headers for a request or response.
PRIORITY: Used to specify the priority of a stream.
SETTINGS: Conveys configuration parameters for the connection.
PUSH_PROMISE: Used by the server to initiate a server push.

Core Concept 2: Streams and Multiplexing

The binary framing layer directly enables the headline feature of HTTP/2: . This feature completely solves the head-of-line blocking problem of HTTP/1.1.

Streams: The Virtual Channels

Within a single HTTP/2 connection, the client and server can establish multiple independent, bidirectional sequences of frames called streams. You can think of a single TCP connection as a main highway, and each stream as a dedicated lane on that highway. One lane might be for the HTML document, another for a CSS file, and a third for an image. All these lanes exist on the same highway and can be used at the same time.

Each stream is assigned a unique Stream Identifier.
A request and its corresponding response occur within the same stream.
Client-initiated streams always have odd-numbered IDs, while server-initiated streams (used for Server Push) have even-numbered IDs.
A single connection can handle hundreds or thousands of concurrent streams.

How Multiplexing Solves HOL Blocking

Because communication is broken down into small, independent frames, each tagged with its stream ID, the client and server can interleave frames from multiple streams on the same connection. If a response for a large image (on Stream 5) is delayed, the server can still send frames for the CSS file (on Stream 3) and the JavaScript file (on Stream 7) without waiting. When the frames arrive at the client, they are reassembled into their respective streams using the Stream Identifier.

This parallel processing within a single connection means that a slow response no longer blocks faster ones. It eliminates the need for workarounds like multiple connections, leading to faster page loads, less resource consumption on both the client and server, and a more efficient use of the network.

Core Concept 3: Server Push

Traditionally, HTTP communication is strictly initiated by the client. The browser requests an HTML file, parses it, finds references to CSS and JavaScript files, and then sends new requests for those resources. This creates a waterfall of requests, with inherent delays between each step.

HTTP/2 introduces a powerful new mechanism called , which allows the server to break this request-response cycle. An intelligent server can anticipate which resources a client will need and proactively "push" them to the client's cache before they are even requested.

How Server Push Works

Client Request: The browser sends a normal request for a resource, for example, index.html.
Server Analysis: The server receives the request. It knows that any browser requesting index.html will immediately also need style.css and app.js to render the page.
Server Pushes Resources: Before sending the response for index.html, the server sends special PUSH_PROMISE frames to the client. These frames say, in effect, "I am about to send you style.css and app.js, so you do not need to request them". It then sends the data for these resources on new, server-initiated streams.
Original Response: The server then sends the original response for index.html.
Client Receives and Caches: The browser receives all three resources nearly simultaneously. When it starts parsing the HTML and finds the links to the CSS and JS files, it realizes they are already in its cache and can use them immediately.

Server Push can significantly reduce page load times by eliminating the round-trip latency of the browser having to request each resource individually. However, it must be used carefully. Pushing resources that the client already has cached is wasteful. Modern best practices often favor using `Preload` hints over Server Push due to complexities in its implementation and caching behavior.

Other Major Improvements in HTTP/2

Beyond the three core concepts, HTTP/2 introduced several other critical improvements that contribute to its superior performance.

Header Compression (HPACK): In HTTP/1.1, headers are sent as plaintext with every request and response, often with significant redundancy. For example, a browser might send identical User-Agent and Accept headers dozens of times for a single page load. This adds up to significant overhead. HTTP/2 introduces a highly effective header compression format called HPACK. It uses a static table of common headers and a dynamic table that is built up over the course of a connection to encode headers in a much more compact form. This dramatically reduces the amount of data that needs to be sent, especially on mobile networks.
Stream Prioritization: With multiplexing, a browser can make many requests at once, but not all resources are equally important for rendering a page. For instance, a render-blocking CSS file is more critical than an image at the bottom of the page. HTTP/2 allows the client to assign a priority and dependencies to each stream. It can tell the server, "This stream depends on another stream" or "This stream is more important". This allows the server to more intelligently allocate resources, such as CPU and bandwidth, to deliver the most critical resources first, further optimizing page load times.
One Connection Per Origin: By solving HOL blocking through multiplexing, HTTP/2 eliminates the need for multiple TCP connections to a single origin. A single connection is more efficient. It reduces the overhead of TCP and TLS handshakes, uses fewer system resources (sockets, memory), and allows for better network congestion control, as the TCP algorithm has a more accurate view of the network conditions on a single, long-lived connection.
Secure by Default (in practice): While the HTTP/2 specification itself does not mandate the use of encryption, all major browser implementations (Chrome, Firefox, Safari, Edge) require HTTP/2 to be run over a TLS-encrypted connection (HTTPS). This has made the modern web significantly more secure. The negotiation to use HTTP/2 happens during the TLS handshake via a mechanism called ALPN (Application-Layer Protocol Negotiation).