Hypertext Transfer Protocol (HTTP)

The protocol of the World Wide Web for requesting and serving web pages.

Introduction to HTTP: The Language of the Web

Imagine you are at a library. You walk up to the librarian's desk (the server) and make a request for a specific book (a resource like a webpage). You, the person making the request, are the client. The set of rules you use to ask for the book and for the librarian to respond is a protocol. On the internet, this fundamental protocol is called the , or Hypertext Transfer Protocol. It is the foundation of data communication for the World Wide Web, enabling browsers to fetch documents from servers.

At its heart, HTTP is a request-response protocol that operates on a . The client, typically your web browser, initiates communication by sending a request message to a server. The server, a powerful computer hosting the website's files, processes this request and sends back a response message. This response might contain the requested HTML document, an image, or simply a status update about the request. Every time you click a link, type a web address, or view an image online, this silent, high-speed conversation is happening in the background.

Dissecting a Web Address: The Uniform Resource Locator (URL)

Before a client can make a request, it needs to know the exact address of the resource it wants. This address is known as a (Uniform Resource Locator). A URL is a specific type of URI (Uniform Resource Identifier) that not only names a resource but also specifies how to locate it. Let's break down a typical URL into its constituent parts, using an example from your notes: http://www.cisco.com/web-server.htm.

Scheme (Protocol): http:
This is the first part of the URL, ending with a colon. It tells the browser which protocol to use for the communication. In this case, it is $http$ . Another very common scheme is $https$ , which indicates a secure connection. The scheme is followed by a colon and two forward slashes (://).
Host (Server Name): www.cisco.com
This component identifies the server where the resource is located. It is the user-friendly domain name that corresponds to the server's numerical (like $198.133.219.25$ ). Your computer uses a system called (Domain Name System) to translate this human-readable name into the machine-readable IP address.
Path: /web-server.htm
The path indicates the specific file or resource being requested on the server. It's like specifying a folder and file name on your own computer. The path always begins with a single forward slash. In this example, we are requesting the file named web-server.htm from the root directory of the server.
Optional Components (Not in this example):
- Port: A specific "gate" on the server for the connection. HTTP defaults to port $80$ , and HTTPS to $443$ . If a different port is used, it's specified after the host, separated by a colon, like :8080.
- Query String: Starts with a question mark (?) and is used to send additional data to the server, often as key-value pairs (e.g., ?search=networking&page=2).
- Fragment: Starts with a hash (#) and points to a specific section within the requested resource (e.g., #section2), which the browser can jump to after loading the page.

The HTTP Request-Response Cycle in Detail

The communication between a client and server always follows a strict request-response cycle. The client sends a request, and the server sends a response. They cannot speak at the same time. Let's examine the structure of these messages.

The HTTP Request Message

An HTTP request message consists of a request line, a series of headers, an empty line, and an optional message body.

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 ...
Accept-Language: en-US
Accept-Encoding: gzip, deflate
Connection: keep-alive

Request Line: The first line specifies the (e.g., GET), the path to the resource, and the HTTP protocol version.
Headers: These are key-value pairs that provide additional information about the request, such as the host, browser type (User-Agent), and acceptable languages.
Body: Not present in GET requests, but for methods like POST, it contains the data being sent to the server (e.g., form data).

The HTTP Response Message

A response from the server follows a similar structure, with a status line, headers, an empty line, and a message body containing the requested resource.

HTTP/1.1 200 OK
Date: Mon, 23 May 2025 22:38:34 GMT
Server: Apache/2.4.1 (Unix)
Content-Type: text/html; charset=UTF-8
Content-Length: 122

<!DOCTYPE html>
<html>
...
</html>

Status Line: The first line includes the HTTP version, a numerical (e.g., 200), and a textual reason phrase (e.g., OK).
Headers: Key-value pairs describing the response, like the date, server type, and the type of content in the body (Content-Type).
Body: The actual resource requested by the client, in this case, the HTML code of the webpage.

HTTP Methods: The Verbs of the Web

HTTP methods, sometimes called HTTP verbs, define the action that the client wants the server to perform on a resource. While there are several methods, the most common ones are fundamental to nearly all web interactions.

GET: This is the most common method. A $GET$ request is used to retrieve data from a server. When you type a URL into your browser, you are initiating a GET request. These requests should only retrieve data and should not have any other effect on the server (a property known as safety). Any data sent with a GET request is appended to the URL as a query string, making it visible and limiting its length.
POST: A $POST$ request is used to send data to the server to create a new resource. When you fill out a contact form or submit your login credentials, your browser sends a POST request. The data is included in the body of the request, not in the URL, making it more secure for sensitive information and allowing for much larger amounts of data to be sent. POST requests are not safe, as they modify data on the server, and they are not , meaning repeated identical requests will create multiple resources.
PUT: The $PUT$ method is used to update an existing resource or create it if it doesn't exist. It replaces the entire target resource with the data provided in the request body. The key difference between POST and PUT is that PUT is idempotent. Sending the same PUT request multiple times will always result in the same state on the server.
DELETE: As the name suggests, the $DELETE$ method requests the removal of a specific resource from the server. Like PUT, DELETE operations are idempotent; deleting a resource multiple times has the same effect as deleting it once.
HEAD: The $HEAD$ method is identical to a GET request, but the server does not send the message body in the response. It's used to retrieve the headers for a resource, which can be useful for checking if a resource exists, its size, or when it was last modified, without having to download the entire content.

HTTP Status Codes: The Server's Reply

Every HTTP response includes a three-digit status code that informs the client about the result of its request. These codes are grouped into five classes, indicated by their first digit. Understanding them is crucial for troubleshooting.

1xx (Informational): The request was received, and the process is continuing. These are rarely seen by end-users. Example: $100 Continue$ .
2xx (Success): The request was successfully received, understood, and accepted.
- $\text{200 OK}$ : The standard response for a successful request. The body will contain the requested resource.
- $\text{201 Created}$ : The request was successful, and a new resource was created as a result (e.g., after a POST request).
3xx (Redirection): Further action needs to be taken by the client to complete the request. The client needs to go to a different URL.
- $\text{301 Moved Permanently}$ : The requested resource has been permanently moved to a new URL. The browser should update its bookmarks.
- $302 Found$ : The resource has been temporarily moved to a different URL. The browser should continue using the original URL for future requests.
4xx (Client Error): The request contains bad syntax or cannot be fulfilled. The error is on the client's side.
- $\text{400 Bad Request}$ : The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax).
- $\text{403 Forbidden}$ : The client does not have access rights to the content. It's authenticated, but not authorized.
- $\text{404 Not Found}$ : The server cannot find the requested resource. This is one of the most famous status codes on the web.
5xx (Server Error): The server failed to fulfill an apparently valid request. The error is on the server's side.
- $\text{500 Internal Server Error}$ : A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.
- $\text{503 Service Unavailable}$ : The server is not ready to handle the request, often because it's down for maintenance or is overloaded.

The Evolution of HTTP

HTTP was not created in a single day. It has evolved significantly since its inception to meet the growing demands of the web, focusing on improving performance and efficiency.

HTTP/0.9 - The One-Line Protocol (1991): The very first version. It was extremely simple: a request consisted of a single line, the $GET$ method followed by the resource path. The response was just the HTML file itself, with no headers, status codes, or other metadata.
HTTP/1.0 - Building Extensibility (1996): This version introduced key features we still use today: headers for both requests and responses, status codes in responses, and the ability to transfer files other than HTML (like images and video). A major limitation was that a new connection had to be established for every single resource, which was inefficient for pages with many images.
HTTP/1.1 - The Internet Standard (1997): For over 15 years, HTTP/1.1 was the cornerstone of the web. It introduced crucial performance optimizations. The most significant was (via the `Keep-Alive` header, later made default), which allowed a browser to download multiple resources over a single TCP connection, drastically reducing latency. It also introduced request pipelining, chunked transfers, and the mandatory `Host` header.

Securing the Web: What the 'S' in HTTPS Means

In the early days of the web, all HTTP traffic was sent in plain text. This meant that anyone snooping on the network could easily read all the data being exchanged between a browser and a server, including passwords and credit card numbers. To fix this massive security hole, (Hypertext Transfer Protocol Secure) was developed.

HTTPS is not a separate protocol; it is simply HTTP running over an encrypted connection. This encryption is handled by a protocol called (Transport Layer Security), formerly known as SSL (Secure Sockets Layer).

Here's how it works in a simplified way:

Certificate Verification: When your browser connects to a website using HTTPS, the server first presents its SSL/TLS certificate. This digital certificate acts like an ID card, proving the server's identity to your browser. It is issued and verified by a trusted third party called a Certificate Authority (CA).
The Handshake: The client and server then perform a process called a TLS handshake. During this handshake, they use the information in the certificate (specifically, a public key) to securely agree upon a shared, secret session key.
Encrypted Communication: Once the handshake is complete, all subsequent HTTP traffic between the client and server is encrypted using this session key. Anyone intercepting the data will only see a scrambled, unreadable stream of characters. This provides confidentiality, integrity, and authentication for your web communications.

Modern browsers like Chrome now flag any site not using HTTPS as "Not Secure," and you can identify a secure site by the padlock icon in the address bar next to the URL, which always starts with https://.

The Challenge of State in a Stateless Protocol

One of the core design principles of HTTP is that it is a . This means each request-response cycle is completely independent. When the server receives a request, it has no memory of any previous requests from that same client.

This simplicity is powerful for scalability, but it poses a problem for creating interactive web experiences. How does a shopping website remember the items in your cart? How does a site keep you logged in as you navigate from page to page?

The solution is the use of . A cookie is a small piece of data that the server sends to the browser in a response header (Set-Cookie). The browser stores this cookie and then sends it back to the server with every subsequent request in a request header (Cookie). This cookie can contain a unique session identifier, allowing the server to look up the user's state (like their shopping cart or login status) and provide a continuous, stateful experience across multiple stateless requests.