Stop-and-Wait Protocol
Simple acknowledgment-based flow control mechanism with timeout and retransmission.
The Fundamental Challenge: Reliable Communication over an Unreliable Channel
The core mission of the Data Link Layer is to provide reliable data transfer between two directly connected devices. This sounds simple, but the physical medium connecting them: whether a copper cable, a fiber-optic strand, or the open air, is inherently imperfect. Signals can be corrupted by electrical noise, attenuated over distance, or simply lost due to transient interference. This creates two fundamental problems:
- Corrupted Frames: How does the receiver know if the data it received is the same data the sender transmitted?
- Lost Frames: If a frame (or its acknowledgment) vanishes completely, how does the sender know it needs to retransmit?
The simplest possible answer to these problems is the Stop-and-Wait Protocol. It is a foundational protocol. While rarely used in its pure form today due to its inefficiency, understanding it is absolutely essential because all modern, sophisticated protocols like TCP are built upon its basic principles of acknowledgments and timeouts.
The Core Rule of Stop-and-Wait
The protocol's logic can be distilled into one single, unbreakable rule for the sender:
This simple rule provides both flow control (preventing a fast sender from overwhelming a slow receiver) and error control (ensuring lost or damaged frames are retransmitted). It’s like having an extremely careful conversation where you speak one sentence, then wait until the other person explicitly confirms they heard it correctly before you proceed.
Stop-and-Wait in Action: The "Happy Path"
Let's first examine the ideal scenario where the channel behaves perfectly and no data is lost or corrupted.
- Step 1: Sender Transmits Frame: The sender (Node A) takes the first piece of data, encapsulates it in a frame with sequence number 0, and sends it to the receiver (Node B).
- Step 2: Sender Starts a Timer: Immediately after sending the frame, Node A starts a . This timer acts as a failsafe.
- Step 3: Receiver Gets the Frame: After a propagation delay, Node B receives Frame 0. It checks the FCS to ensure the frame is not corrupted.
- Step 4: Receiver Sends Acknowledgment: Since the frame is valid, Node B sends back an acknowledgment frame (ACK). This ACK specifies the sequence number of the next frame it expects to receive. In this case, since it correctly received Frame 0, it sends ACK 1.
- Step 5: Sender Receives ACK: Node A receives ACK 1 before its timer expires. This confirms that Frame 0 arrived safely. Node A stops the timer.
- Step 6: Repeat the Cycle: Now that Frame 0 is confirmed, Node A is free to send the next frame in the sequence, Frame 1, and the cycle begins anew.
Handling Failure: The Timeout Mechanism
The real world is not perfect. Frames can get lost. The timeout timer is the sender's only way of detecting this type of failure.
Scenario 1: Lost Data Frame
- The sender transmits Frame 0 and starts its timer.
- The frame is lost in transit due to a burst of noise on the channel.
- The receiver, Node B, never receives anything. It continues to wait passively.
- The sender, Node A, waits. No ACK arrives.
- The sender's timeout timer expires. From the sender's perspective, this is an unambiguous signal that something went wrong.
- Assuming the frame was lost, the sender retrieves the copy of Frame 0 from its buffer and retransmits it, starting the timer again.
The timer is the core of the "Automatic Repeat reQuest" mechanism. It's the trigger for automatically resending lost data.
Scenario 2: Lost Acknowledgment (ACK) Frame
This scenario is more subtle but equally important to handle. What if the data frame arrives safely, but the acknowledgment gets lost on its way back?
- The sender transmits Frame 0 and starts its timer.
- The receiver correctly receives Frame 0 and sends back ACK 1.
- The ACK 1 frame is lost in transit.
- The sender, unaware that the data arrived, continues to wait.
- The sender's timeout timer expires. Just as before, the sender has no way of knowing why the ACK didn't arrive. It must assume the worst: that the original data frame was lost.
- The sender retransmits Frame 0.
Solving the Duplicate Problem: The Power of a Single Bit
The "Lost ACK" scenario creates a new problem: the receiver will now receive a duplicate copy of Frame 0. If it simply accepted this duplicate and passed it to the network layer, the data stream would be corrupted (e.g., a sentence in a file might appear twice).
Stop-and-Wait solves this with the simplest possible form of sequence numbering. It only needs a single bit to alternate between frame identifiers, typically 0 and 1.
How 1-bit Sequence Numbers Work
The conversation now includes sequence checks:
- Sender sends
Frame 0. - Receiver gets
Frame 0, sends backACK 1, and now expectsFrame 1. - The
ACK 1is lost. Sender's timer expires. - Sender retransmits
Frame 0. - Receiver gets the duplicate
Frame 0. It checks the sequence number. It was expecting Frame 1, but it received Frame 0. - The receiver now knows this is a duplicate. It discards the duplicate frame but, as a helpful measure, it re-sends the last successful acknowledgment,
ACK 1, because the sender is clearly still waiting for it.
This tiny 1-bit sequence number is enough to make the protocol robust against both lost frames and lost ACKs, ensuring that each data frame is delivered to the higher layer exactly once.
Analysis: The Crushing Inefficiency of Stop-and-Wait
While simple and reliable, Stop-and-Wait is incredibly inefficient, especially on modern, high-speed, long-distance networks. Its performance is limited not by the link's bandwidth, but by its propagation delay.
A Real-World Example: Trans-Atlantic Fiber
Let's consider a realistic scenario: sending data over a dedicated fiber-optic link between New York and London.
- Link Bandwidth (R): 1 Gigabit per second ( bps).
- Distance (d): Approximately 5,600 km.
- Propagation Speed (v): The speed of light in fiber optic glass is about meters/sec.
- Frame Size (L): A standard 1,500 byte (12,000 bit) frame.
First, let's calculate the two critical time components:
- Transmission Time : How long it takes to push all the bits of one frame onto the wire.
- Propagation Delay : How long it takes for the first bit to travel from one end to the other.
- Round-Trip Time (RTT): The time for a frame to go and an ACK to come back. We assume the ACK is very small, so its transmission time is negligible. The RTT is dominated by the two-way travel time.
Calculating Link Utilization
The sender spends transmitting and waiting. The link utilization is the fraction of time the link is actually used for sending data:
Link Utilization: 0.0214%
This result is staggering. We have a 1 Gbps fiber optic link: a data superhighway, but the Stop-and-Wait protocol forces it to be idle for 99.9786% of the time. Our effective throughput is not 1 Gbps, but a mere , slower than an old home internet connection from the year 2000.
Conclusion: A Vital but Insufficient Stepping Stone
The Stop-and-Wait protocol is a perfect theoretical tool for introducing the concepts of reliability at the link layer. It provides flawless flow control and, with the addition of sequence numbers, robust error control. It is simple, easy to understand, and completely reliable.
However, its practical performance is unacceptable for nearly all modern applications. The "wait" period inherent in its design is a bottleneck that cannot be overcome without changing the fundamental rule. This profound inefficiency is precisely why more advanced techniques, namely the Sliding Window Protocols like Go-Back-N and Selective Repeat, were developed. They build on the solid foundation of Stop-and-Wait's acknowledgments and timeouts but add the crucial ability to "fill the pipe," keeping the channel busy and unlocking the true potential of modern networks.