Congestion Control

Network-wide congestion avoidance and management strategies.

1. What is Network Congestion?

In any network, from a small home LAN to the global internet, resources are finite. Network links have a maximum bandwidth, and devices like routers have a limited amount of memory and processing power. happens when the demand for these resources exceeds their supply.

When a router receives packets faster than it can forward them out of the appropriate interface, it stores the excess packets in a temporary memory buffer called a queue. If traffic continues to arrive at this high rate, the queue will grow. This is the root of congestion. Its symptoms are immediately apparent to users and applications:

Increased Latency (Delay): Packets spend more time waiting in the queue, so it takes them longer to reach their destination.
Increased Jitter: The length of the queue can fluctuate rapidly, causing the delay to vary from packet to packet, which is detrimental to real-time applications.
Packet Loss: Eventually, the queue becomes completely full. Any new packets that arrive cannot be stored and are simply dropped (discarded) by the router.

If left unmanaged, severe congestion can lead to a state known as congestion collapse, where the network becomes gridlocked. Senders, noticing their packets are lost, retransmit them, adding even more traffic to an already overloaded network. The majority of the network's bandwidth is spent retransmitting packets that will likely be lost again, and very little useful data gets through. Congestion control is the set of mechanisms designed to prevent this scenario and manage network resources gracefully.

2. Congestion Control vs. Flow Control

It is essential to distinguish congestion control from a related but distinct concept: flow control.

Flow Control
is a local mechanism that operates between a single sender and a single receiver. Its purpose is to ensure that the sender does not transmit data faster than the receiver can process it. The receiver advertises its available buffer space (the receive window in TCP), and the sender adjusts its transmission rate to avoid overflowing that buffer. Flow control protects the receiver.
Congestion Control
is a global, network-wide issue. Its purpose is to protect the network itself from being overloaded. A sender might be sending data to a very fast receiver with a huge buffer, so flow control would allow it to send at a high rate. However, if the path between them contains a slow or congested link, congestion control mechanisms will force the sender to slow down to avoid contributing to the network's congestion. Congestion control protects the network.

3. The Goals of Congestion Control

Effective congestion control algorithms strive to balance several competing objectives:

Efficiency: The network should have high throughput, meaning a high rate of successful data delivery. Resources should not be underutilized.
Fairness: The algorithm should allocate a fair share of the network bandwidth to competing data flows. The definition of "fair" can vary, but typically it means that flows sharing the same bottleneck link should receive a roughly equal portion of the capacity.
Stability: The network's performance should be stable and predictable, avoiding wild oscillations in throughput and delay.
Low Delay: Especially for interactive and real-time applications, keeping queueing delays low is a primary goal.

4. TCP Congestion Control: A Cooperative Approach

The most widely deployed and studied congestion control mechanisms are part of the Transmission Control Protocol (TCP). TCP's approach is an end-to-end strategy, where the sender infers the state of the network and adjusts its behavior accordingly, without explicit signals from the routers themselves. The central concept in TCP congestion control is the .

The `cwnd` represents the sender's current estimate of the available capacity on the network path. The sender is not allowed to send more than the minimum of the `cwnd` and the receiver's advertised flow control window (`rwnd`). TCP's algorithms dynamically adjust the size of this `cwnd` based on feedback received from the network in the form of acknowledgments (ACKs) and evidence of packet loss. This dynamic adjustment generally follows a principle known as AIMD (Additive Increase, Multiplicative Decrease).

The lifecycle of TCP congestion control can be broken down into distinct phases.

Phase 1: Slow Start

When a TCP connection first begins, the sender knows nothing about the available bandwidth of the network path. The Slow Start phase is designed to rapidly probe the network to find an approximate capacity. The name is somewhat misleading, as the window growth is actually exponential.

The session starts with a very small `cwnd`, typically between 1 and 10 Maximum Segment Sizes (MSS). Let's assume it starts at `cwnd = 1 MSS`.
The sender transmits one segment. When the acknowledgment (ACK) for that segment is received, the sender increases its `cwnd` by 1 MSS.
Since this happens for every ACK, the `cwnd` effectively doubles for every round-trip time (RTT). (Sending 1 packet yields 1 ACK, `cwnd` becomes 2. Sending 2 packets yields 2 ACKs, `cwnd` becomes 4, and so on).
This exponential growth continues until either a packet is lost, or the `cwnd` reaches a value called the .

Phase 2: Congestion Avoidance

Once the `cwnd` reaches the `ssthresh`, the sender assumes it is approaching the network's capacity and must switch to a more cautious mode to avoid causing congestion. This is the Congestion Avoidance phase.

Instead of exponential growth, the sender now uses an additive increase.
The `cwnd` is increased by approximately 1 MSS for every full round-trip time. This means for every `cwnd`'s worth of ACKs received, the window grows by one segment.
This represents a much slower, linear growth phase, allowing the sender to gently probe for additional available bandwidth.

Phase 3: Congestion Detection and Reaction

The Congestion Avoidance phase continues until the sender detects packet loss, which it takes as a clear signal of congestion in the network. TCP has two primary ways of detecting packet loss, and its reaction differs significantly for each.

Reaction to a Timeout: This is the most severe indication of congestion. A timeout occurs when the sender does not receive an ACK for a transmitted segment within a calculated retransmission timeout (RTO) period. It implies that a packet (and likely its subsequent ACKs) are well and truly lost.
- The `ssthresh` is set to half of the current `cwnd`.
- The `cwnd` is drastically reduced, resetting to `1 MSS`.
- The sender re-enters the Slow Start phase. This is the classic behavior of TCP Tahoe.
Reaction to Triple Duplicate ACKs (Fast Retransmit/Fast Recovery): TCP can also infer a single packet loss much earlier. If a sender transmits segments 1, 2, 3, 4, 5 and segment 3 is lost, the receiver will receive 1, 2, 4, 5. It will ACK segment 2, then upon receiving segment 4, it will send a duplicate ACK for segment 2 again (indicating it's still waiting for 3). When segment 5 arrives, it sends another duplicate ACK for 2. When the sender receives three duplicate ACKs for the same segment, it assumes the subsequent segment was lost and performs a Fast Retransmit without waiting for a timeout. This indicates a less severe congestion event. The reaction, characteristic of TCP Reno, is also less severe:
- This is the "multiplicative decrease" part of AIMD. The `ssthresh` is set to half of the current `cwnd`.
- The `cwnd` is also reduced to the new `ssthresh` value (not back to 1).
- The sender immediately enters the Congestion Avoidance phase, skipping the aggressive Slow Start.

5. Modern TCP Congestion Control Algorithms

While TCP Reno's AIMD approach is foundational, modern networks with very high bandwidth and long delays ("long fat networks") exposed weaknesses in its linear probing. This led to the development of more advanced algorithms.

TCP CUBIC
CUBIC is the default congestion control algorithm in Linux and many other modern systems. Instead of linear growth during Congestion Avoidance, CUBIC uses a cubic function to govern the window growth. After a packet loss, the window is reduced, and then it grows very rapidly at first. As it approaches the size of the window before the loss, the growth slows down significantly to be cautious. After passing that point, it begins to accelerate again to aggressively probe for new available bandwidth. This approach is more stable and performs much better on high-speed, long-latency networks.
BBR (Bottleneck Bandwidth and Round-trip propagation time)
Developed by Google, BBR takes a fundamentally different approach. Instead of using packet loss as the primary signal for congestion, BBR actively tries to measure the two key parameters of the network path: the bottleneck bandwidth and the round-trip propagation time. It then paces its sending rate to match the measured bottleneck bandwidth. By doing so, BBR aims to operate at the network's sweet spot, achieving high throughput without filling up the router buffers, thus avoiding the high latency and packet loss (bufferbloat) that loss-based algorithms like Reno and CUBIC can cause.

6. Network-Assisted Congestion Control

While end-to-end mechanisms like TCP's are powerful, network devices themselves can play a more active role in managing congestion.

Active Queue Management (AQM)
AQM refers to a class of router-based algorithms that manage the length of queues proactively. Instead of waiting for a queue to become completely full and then dropping all incoming packets (a behavior called tail-drop), AQM algorithms start dropping packets earlier. The most well-known AQM algorithm is Random Early Detection (RED). RED monitors the average queue length and, as it grows, starts randomly dropping packets with an increasing probability. This sends an early warning signal to TCP senders, prompting them to reduce their sending rates before severe congestion occurs, leading to a more stable network.
Explicit Congestion Notification (ECN)
ECN is a further refinement of AQM. Instead of dropping a packet to signal congestion, an ECN-aware router can set a special "Congestion Experienced" flag in the IP header of the packet and forward it normally. The receiving device sees this flag and echoes the congestion notification back to the sender in its TCP acknowledgment. The TCP sender then reacts as if a triple duplicate ACK was received (reducing its window) but without the need to retransmit a lost packet. ECN allows for congestion signaling without inducing packet loss, which is more efficient.

In conclusion, congestion control is a complex and dynamic field. It is a cooperative dance between end systems, which try to adapt to the network's state, and the network devices themselves, which manage their resources and can provide signals to help end systems behave well. These mechanisms are the hidden engine that allows the shared, public internet to function efficiently at a global scale.