Client-Server Architectures and RESTful APIs

What REST actually means (and why most ‘REST APIs’ aren’t)

Author

Karsten Naert

Published

February 9, 2026

Introduction

In the first three lectures we worked our way up from the bottom of the stack: processes, threads, synchronization primitives, queues, sockets, TCP, and even a bare-bones HTTP server built from raw sockets. We know how bytes get from point A to point B.

Now it’s time to zoom out and think about architecture. This lecture is heavier on ideas than on code—but the ideas are what make the code make sense. By the end you’ll understand:

  • What the client-server model is and why it dominates the web.
  • What RPC is and where it falls short.
  • What REST actually means (spoiler: not what most job postings think it means).
  • How a threaded web server works under the hood.
  • How the WSGI protocol separates the server from your Flask/Django application.

This is the fourth lecture in a five-part series:

  1. Processes and Threads
  2. Multiprocessing and Multithreading in Practice
  3. Interprocess Communication and Sockets
  4. Client-Server Architectures and RESTful APIs (you are here)
  5. Async Programming, Event Loops, and ASGI

The Client-Server Model

The Basic Idea

A client-server architecture is deceptively simple: one process (the server) provides a service, and one or more other processes (the clients) consume that service. The server sits around waiting for requests; the client initiates contact when it needs something.

You’ve already built one. The chat server from Lecture 3 was a textbook client-server application:

  • The server was long-lived—it started up and waited indefinitely for connections.
  • Clients came and went, connecting and disconnecting at will.
  • The server was the authority: it managed the shared client list and decided who received which messages.

This pattern is everywhere. When you open a website, your browser (client) talks to a web server. When you send a Slack message, the Slack app (client) talks to Slack’s servers. When git push sends your code to GitHub, your git client talks to GitHub’s server.

Client-Server vs Peer-to-Peer

The main alternative is peer-to-peer (P2P), where every participant is both a client and a server simultaneously. BitTorrent is the classic example: every machine downloading a file is also uploading pieces to others.

Client-Server Peer-to-Peer
Authority Server is the single source of truth No central authority
Scalability Server can become a bottleneck Scales naturally with more peers
Simplicity Simple mental model Complex coordination
Examples The web, email, databases BitTorrent, blockchain, some chat protocols

For web APIs—our focus—client-server is the universal choice. The server owns the data and the business logic; clients just ask for things and display results.

Thin Clients vs Thick Clients

Not all clients are equal. Think of a spectrum:

  • Thin client: does almost nothing; the server does all the work. Example: a 1990s terminal displaying text from a mainframe. Or a server-rendered web page where the server produces the full HTML.
  • Thick client: does significant processing locally. Example: a modern single-page application (React, Vue) that fetches JSON from the server and renders everything client-side. Or a mobile app with offline capabilities.

Most modern web applications fall somewhere in between. The server provides data (usually as JSON), and the client (a browser running JavaScript, or a mobile app) handles presentation and some logic. Understanding this split matters because it influences API design: a thick client needs a fine-grained, data-oriented API; a thin client might prefer the server to do more work per request.

Remote Procedure Calls (RPC)

The Idea

Once you have a client and a server, the next question is: how do they talk? The most intuitive answer is RPC: Remote Procedure Call. The idea is seductive in its simplicity—call a function, but on someone else’s computer.

Your code looks like a normal function call:

result = remote_server.add(3, 4)

But under the hood, this serializes the arguments, sends them over the network, the server deserializes and executes the function, serializes the result, sends it back, and the client deserializes it. All the network plumbing is hidden behind what looks like a local function call.

A Quick Demo with xmlrpc

Python ships with a batteries-included RPC implementation: xmlrpc. It’s ancient (uses XML over HTTP), but it demonstrates the concept perfectly with minimal code.

rpc_server.py — exposes Python functions over the network:

# rpc_server.py
from xmlrpc.server import SimpleXMLRPCServer

def add(x, y):
    return x + y

def multiply(x, y):
    return x * y

def greeting(name):
    return f"Hello, {name}! Greetings from the server."

server = SimpleXMLRPCServer(("127.0.0.1", 8000))
server.register_function(add)
server.register_function(multiply)
server.register_function(greeting)
print("RPC server listening on http://127.0.0.1:8000 ...")
server.serve_forever()

rpc_client.py — calls those functions as if they were local:

# rpc_client.py
from xmlrpc.client import ServerProxy

server = ServerProxy("http://127.0.0.1:8000")

print(server.add(3, 4))           # 7
print(server.multiply(6, 7))      # 42
print(server.greeting("Alice"))   # Hello, Alice! Greetings from the server.

Run the server in one CMD window, the client in another:

# Window 1
python rpc_server.py

# Window 2
python rpc_client.py

Look at the client code. It reads like normal Python. server.add(3, 4) looks like calling a local method—but it’s actually sending an HTTP request with an XML payload to another process. The ServerProxy object intercepts attribute access and turns it into network calls. Neat.

Under the Hood

If you’re curious what the wire format looks like, add verbose=True to the ServerProxy:

server = ServerProxy("http://127.0.0.1:8000", verbose=True)

You’ll see the raw XML request and response printed to the console. It’s… not pretty. This is why XML-RPC fell out of fashion—JSON is much more readable. But the concept of RPC lives on in modern systems like gRPC (Google’s protocol-buffer-based RPC framework).

The Problem with RPC

RPC is elegant, but it has a fundamental flaw: it pretends the network doesn’t exist.

A local function call is:

  • Fast — nanoseconds.
  • Reliable — if the function exists, it will run.
  • Atomic — it either completes or throws an exception.

A network call is:

  • Slow — milliseconds at best, seconds at worst.
  • Unreliable — the server might be down, the network might be congested, packets might be lost.
  • Ambiguous — if you don’t get a response, did the server process your request or not? (Did transfer_money(1000) execute? Do you retry and risk a double transfer?)

By making network calls look like local calls, RPC lures you into ignoring these realities. You write code that works fine on your laptop and breaks catastrophically in production when latency spikes or a server restarts mid-request.

This critique—articulated in the famous 1994 paper A Note on Distributed Computing by Waldo et al.—was one of the motivations for a different approach: REST.

RPC Is Not Dead

Despite the critique, RPC is alive and well. gRPC (used heavily at Google, Netflix, and many microservice architectures) is a modern, high-performance RPC framework that uses Protocol Buffers for serialization. It’s explicit about the network (with features like deadlines, cancellation, and streaming) and doesn’t pretend calls are local.

The lesson isn’t “don’t use RPC.” It’s “understand the tradeoffs.”

REST: The Idea

Origin Story

In 2000, Roy Fielding—one of the principal authors of the HTTP specification—published his PhD dissertation. Chapter 5 introduced REST: Representational State Transfer. It wasn’t a protocol, a library, or a specification. It was an architectural style—a set of constraints that, when applied together, produce systems with desirable properties like scalability, simplicity, and evolvability.

Fielding was describing the architecture of the web itself. The web already worked this way; he was just naming and formalizing the principles that made it work so well. REST is not something you install. It’s a set of design decisions.

The Six Constraints

REST defines six constraints. An API that satisfies all of them is truly RESTful. Let’s walk through each one.

1. Client-Server

The client and server are separate concerns. The server doesn’t know about the UI; the client doesn’t know about the database. They communicate only through a defined interface (HTTP, in practice). This separation allows them to evolve independently—you can redesign your entire frontend without touching the server, and vice versa.

We’ve been living this constraint since the beginning of this lecture.

2. Statelessness

Each request from the client must contain all the information the server needs to process it. The server doesn’t remember previous requests. There’s no “session” on the server side that tracks where you are in a multi-step process.

This sounds restrictive, but it’s liberating for scalability. If the server holds no per-client state, any server in a cluster can handle any request. You can add more servers behind a load balancer without worrying about sticky sessions or shared state.

“But What About Login Sessions?”

Good question. In practice, authentication tokens (like JWTs or API keys) are sent with every request in a header. The server validates the token each time—it doesn’t “remember” that you logged in. The client holds the state (the token); the server just verifies it. This is statelessness in action.

3. Cacheability

Responses must declare whether they’re cacheable or not. If a response says “this data is valid for the next 5 minutes,” the client (or an intermediary like a CDN) can reuse it without hitting the server again. This reduces load and improves performance dramatically.

HTTP has a rich caching system built in: Cache-Control, ETag, Last-Modified, Expires headers. We’ll touch on these later in this lecture.

4. Uniform Interface

This is the big one—the constraint that gives REST its distinctive flavor. It has four sub-constraints:

a) Resources identified by URIs. Everything the API exposes is a resource, and each resource has a unique address (URI). A book, a user, a list of orders—each gets its own URL.

GET /books/42          ← The book with ID 42
GET /users/alice       ← The user "alice"
GET /orders            ← The collection of all orders

b) Manipulation through representations. You don’t interact with the resource directly—you interact with representations of it. When you GET /books/42, you don’t get the database row. You get a JSON (or HTML, or XML) representation of that book. When you PUT /books/42, you send a representation of what the book should look like, and the server updates its internal state accordingly.

c) Self-descriptive messages. Every message (request or response) contains enough information to understand itself. The Content-Type header says “this body is JSON.” The Allow header says “you can GET or DELETE this resource.” You don’t need out-of-band documentation to parse a single message.

d) HATEOAS — Hypermedia As The Engine Of Application State. This is the most ignored and most misunderstood constraint. The idea: the server’s responses should contain links that tell the client what it can do next.

Think about how you browse a website. You go to the homepage, and it has links to “Products,” “About,” “Contact.” You don’t need a manual telling you which URLs exist—the pages themselves guide you. That’s HATEOAS.

Applied to an API, a response might look like:

{
  "id": 42,
  "title": "The Pragmatic Programmer",
  "author": "Hunt & Thomas",
  "links": {
    "self": "/books/42",
    "author": "/authors/7",
    "reviews": "/books/42/reviews",
    "delete": "/books/42"
  }
}

The client doesn’t hardcode URLs. It follows links from responses. If the server changes its URL structure, clients adapt automatically—just like your browser adapts when a website redesigns its navigation.

5. Layered System

The client doesn’t need to know whether it’s talking directly to the server, or to a load balancer, or to a caching proxy in front of the server. Each layer only knows about its immediate neighbor. This allows you to insert reverse proxies, CDNs, API gateways, and other infrastructure without changing clients or servers.

6. Code on Demand (Optional)

The server can send executable code to the client. JavaScript in web pages is the prime example: the server sends HTML containing <script> tags, and the browser executes the code. This is the only optional constraint—most APIs don’t use it.

REST Is Not a Protocol

REST doesn’t mandate HTTP. It doesn’t mandate JSON. It doesn’t mention status codes or URL patterns. REST is a set of architectural constraints that happen to map naturally onto HTTP—because Fielding designed them by analyzing the web, which runs on HTTP.

When someone says “REST API,” what they usually mean is “HTTP API with JSON responses and resource-oriented URLs.” That’s fine as a shorthand, but it’s not the full picture. Let’s unpack the gap.

What Most People Call “REST” (and Why It’s Not)

The Colloquial Meaning

In everyday developer conversation, “REST API” means something like:

  • Uses HTTP.
  • Has “nice” URLs like /users/42 instead of /getUser?id=42.
  • Sends and receives JSON.
  • Uses HTTP methods (GET, POST, PUT, DELETE) somewhat appropriately.

This is a perfectly reasonable way to build an API. But it’s not REST in the Fielding sense—it’s missing statelessness enforcement, cacheability, self-descriptive messages, and almost certainly HATEOAS. It’s more accurately called an HTTP API or a resource-oriented API.

Does the distinction matter in practice? Sometimes. Understanding real REST helps you design better APIs, ask better questions in architecture discussions, and avoid cargo-culting patterns without understanding why they exist.

The Richardson Maturity Model

Leonard Richardson proposed a handy model for classifying HTTP APIs on a scale from 0 to 3. Think of it as a ladder toward full REST.

Level 0: The Swamp of POX

“POX” = Plain Old XML (or JSON). One URL, one HTTP method (usually POST), and the operation is encoded in the request body.

POST /api

{"action": "getBook", "id": 42}
POST /api

{"action": "deleteBook", "id": 42}

This is basically RPC tunneled through HTTP. The URL and HTTP method carry no meaning—everything is in the body. Many SOAP services live here, and so does our xmlrpc example from earlier.

Level 1: Resources

Each “thing” gets its own URL, but you still use a single HTTP method (usually POST) for everything.

POST /books/42

{"action": "get"}
POST /books/42

{"action": "delete"}

Better—at least the URL tells you what you’re talking about. But the HTTP method doesn’t tell you what you’re doing to it.

Level 2: HTTP Verbs

Now we use HTTP methods meaningfully:

GET    /books/42          ← Read the book
POST   /books             ← Create a new book
PUT    /books/42          ← Replace the book
DELETE /books/42          ← Delete the book

This is where the vast majority of production APIs live. It’s clean, intuitive, and well-supported by tools and frameworks. Most developers call this “RESTful” and stop here.

Level 3: Hypermedia Controls (HATEOAS)

The response includes links that tell the client what to do next:

{
  "id": 42,
  "title": "The Pragmatic Programmer",
  "author": "Hunt & Thomas",
  "_links": {
    "self": {"href": "/books/42"},
    "update": {"href": "/books/42", "method": "PUT"},
    "delete": {"href": "/books/42", "method": "DELETE"},
    "reviews": {"href": "/books/42/reviews"},
    "collection": {"href": "/books"}
  }
}

The client discovers the API by following links, just like a human browsing a website. This is true REST. And almost nobody does it for JSON APIs, because:

  • There’s no universally adopted standard for hypermedia in JSON (HAL, JSON-LD, and JSON:API exist but none dominates).
  • Most API consumers are controlled by the same team that controls the server, so hardcoding URLs is easier.
  • The tooling support isn’t there yet—OpenAPI/Swagger, the dominant API documentation format, doesn’t model HATEOAS well.
The Pragmatic Takeaway

Level 2 is the sweet spot for most teams. Use resources, use HTTP verbs correctly, return proper status codes, and document your API well. That gets you 90% of REST’s benefits.

But know that Level 3 exists. For public APIs consumed by many independent clients, HATEOAS is genuinely valuable—it lets you evolve your API without breaking clients that follow links instead of hardcoding URLs.

Side by Side: RPC-Style vs REST-Style

Let’s compare two API designs for the same problem—managing a library of books. Same functionality, different philosophies.

RPC-Style (Level 0–1):

POST /api/getBook         {"id": 42}
POST /api/listBooks       {"genre": "fiction", "page": 1}
POST /api/createBook      {"title": "Dune", "author": "Herbert"}
POST /api/updateBook      {"id": 42, "title": "Dune (revised)"}
POST /api/deleteBook      {"id": 42}
POST /api/searchBooks     {"query": "Python"}

Every operation is a POST to a unique “action” endpoint. The URL is a verb describing what to do. This is natural if you’re thinking in terms of function calls.

REST-Style (Level 2):

GET    /books?genre=fiction&page=1
GET    /books/42
POST   /books              {"title": "Dune", "author": "Herbert"}
PUT    /books/42           {"title": "Dune (revised)", "author": "Herbert"}
DELETE /books/42
GET    /books?q=Python

The URL is a noun (the resource). The HTTP method is the verb (what you’re doing to it). Filtering and searching are query parameters on the collection URL.

Neither is “wrong.” But the REST-style version is:

  • More predictable — once you know the resource URL, you can guess the CRUD operations.
  • More cacheable — GET requests can be cached; POST requests generally can’t. The RPC-style version makes everything a POST, defeating caching entirely.
  • More aligned with HTTP — proxies, load balancers, and browsers understand GET vs POST. A proxy can cache a GET response or retry it safely. It can’t do that with a POST.
Exercise: Classify an API

Pick a public API you’ve used (GitHub, Spotify, OpenWeatherMap, or any other). Look at its documentation and classify it on the Richardson Maturity Model:

  1. Does each “thing” have its own URL? (Level 1)
  2. Does it use HTTP methods meaningfully? (Level 2)
  3. Do responses include links to related resources? (Level 3)

Most APIs you’ll find are Level 2, possibly with some Level 3 sprinkled in (GitHub’s API is a good example of partial HATEOAS—responses include URL fields for related resources).

HTTP Methods, Status Codes, and Headers

Now that we understand the philosophy, let’s get practical. If you’re building a Level 2 API (and you probably are), you need to know the HTTP toolkit inside out.

HTTP Methods (Verbs)

HTTP defines several methods. Five of them map cleanly to CRUD operations:

Method CRUD Meaning Request Body? Response Body?
GET Read Retrieve a resource No Yes
POST Create Create a new resource Yes Usually
PUT Update (full) Replace a resource entirely Yes Optional
PATCH Update (partial) Modify part of a resource Yes Optional
DELETE Delete Remove a resource Rarely Optional

A few others you’ll encounter:

  • HEAD — identical to GET, but the server returns only headers (no body). Useful for checking if a resource exists or getting its metadata without downloading the whole thing.
  • OPTIONS — asks the server what methods are allowed for a given URL. Used heavily in CORS (Cross-Origin Resource Sharing) preflight requests by browsers.

Safety and Idempotency

Two properties that matter more than you’d think:

Safe methods don’t change anything on the server. GET and HEAD are safe—calling them a million times has no side effects. This is why search engines can crawl the web without breaking things: they only send GET requests.

Idempotent methods produce the same result whether you call them once or ten times. GET, PUT, and DELETE are idempotent. If you DELETE /books/42 twice, the second call is a no-op (the book is already gone). If you PUT /books/42 with the same body twice, the result is the same.

POST is neither safe nor idempotent. Calling POST /books twice creates two books. This is why your browser warns you about “resubmitting form data” when you refresh after a POST.

Method Safe? Idempotent?
GET
HEAD
POST
PUT
PATCH ❌*
DELETE

PATCH can* be idempotent (e.g., “set the title to X”) but doesn’t have to be (e.g., “append Y to the description”). It depends on the operation.

Why This Matters

Idempotency is your best friend in unreliable networks. If your client sends a PUT and doesn’t get a response (network timeout), it can safely retry—the result will be the same. If it sends a POST and doesn’t get a response, it has a problem: did the server create the resource or not? Retrying might create a duplicate.

This is exactly the ambiguity problem we discussed in the RPC section. REST’s use of idempotent methods mitigates it.

Status Codes

Every HTTP response includes a status code—a three-digit number that summarizes what happened. They’re grouped by the first digit:

2xx — Success

The request worked.

Code Name When to use
200 OK General success. GET returns data, PUT/PATCH returns updated resource.
201 Created A new resource was created (typically after POST). Include a Location header with the new resource’s URL.
204 No Content Success, but nothing to return (common for DELETE).

3xx — Redirection

The resource moved.

Code Name When to use
301 Moved Permanently The resource has a new URL. Clients should update their bookmarks.
304 Not Modified Used with caching. “Your cached version is still good, no need to re-download.”

4xx — Client Error

The client did something wrong.

Code Name When to use
400 Bad Request Malformed request (invalid JSON, missing required field, etc.).
401 Unauthorized “Who are you?” — authentication required but not provided (or invalid).
403 Forbidden “I know who you are, but you’re not allowed.” — authenticated but not authorized.
404 Not Found The resource doesn’t exist.
405 Method Not Allowed The URL exists, but not for that method (e.g., DELETE on a read-only resource).
409 Conflict The request conflicts with the current state (e.g., creating a resource that already exists).
422 Unprocessable Entity The request is well-formed but semantically wrong (e.g., age = -5). Popular with APIs using JSON validation.
429 Too Many Requests Rate limiting. “Slow down.”

5xx — Server Error

The server messed up.

Code Name When to use
500 Internal Server Error Something unexpected went wrong. The catch-all.
502 Bad Gateway A proxy or gateway received an invalid response from the upstream server.
503 Service Unavailable The server is overloaded or down for maintenance.
The Short Version

If you remember nothing else: 2xx = good, 4xx = your fault, 5xx = my fault. Beyond that, use the most specific code that fits. Clients (and debugging tools) appreciate the precision.

Important Headers

We introduced headers in Lecture 3 as “the envelope” of an HTTP message. Here are the ones you’ll use most when building and consuming APIs:

Content Negotiation

Content-Type (request and response): declares the format of the body.

Content-Type: application/json
Content-Type: text/html; charset=utf-8
Content-Type: multipart/form-data

Accept (request): tells the server what formats the client can handle.

Accept: application/json
Accept: text/html, application/json;q=0.9

The q=0.9 is a quality factor—it says “I prefer HTML, but JSON is acceptable.” The server should honor this or return 406 Not Acceptable.

Authentication

Authorization (request): carries credentials.

Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

Bearer tokens (like JWTs) are the most common pattern in modern APIs. Basic auth (base64-encoded username:password) is simpler but less secure unless used over HTTPS.

Caching

Cache-Control (response): tells the client (and intermediary caches) how long the response is valid.

Cache-Control: max-age=3600          (valid for 1 hour)
Cache-Control: no-cache              (always revalidate with the server)
Cache-Control: no-store              (don't cache at all — sensitive data)

ETag (response) + If-None-Match (request): a fingerprint of the resource. The client can send the ETag back; if the resource hasn’t changed, the server returns 304 Not Modified instead of re-sending the whole body.

Other Useful Headers

Header Direction Purpose
Location Response URL of a newly created resource (with 201) or a redirect target (with 301/302)
Allow Response Lists the HTTP methods a resource supports (often sent with 405)
X-Request-Id Both A unique ID for tracing a request through logs and systems
Retry-After Response How long to wait before retrying (sent with 429 or 503)

Query Parameters vs Path Parameters vs Request Body

A common source of confusion: where does data go in a request?

Path parameters identify a specific resource:

GET /books/42              ← 42 is a path parameter (identifies the book)
GET /users/alice/orders    ← alice is a path parameter (identifies the user)

Query parameters filter, sort, or paginate a collection:

GET /books?genre=fiction&sort=title&page=2
GET /users?active=true&limit=10

Request body carries the data for creation or update:

POST /books
Content-Type: application/json

{"title": "Dune", "author": "Herbert", "year": 1965}

The rule of thumb: path = which resource, query = how to filter/sort, body = what to create/update. Don’t put creation data in query parameters (POST /books?title=Dune is wrong), and don’t put resource IDs in the body when they belong in the path.

How a Threaded Web Server Works

Let’s connect the dots back to what we built in Lectures 1–3. A web server is, at its core, the threaded TCP echo server from Lecture 3—but instead of echoing bytes back, it parses HTTP requests and generates HTTP responses.

The Request Lifecycle

When you type http://localhost:8000/books/42 in your browser, here’s what happens:

┌──────────┐                              ┌──────────┐
│  Browser  │                              │  Server   │
└─────┬────┘                              └─────┬────┘
      │                                         │
      │  1. DNS lookup: localhost → 127.0.0.1   │
      │                                         │
      │  2. TCP connect to 127.0.0.1:8000       │
      │  ──────────────────────────────────────► │
      │           (three-way handshake)          │
      │  ◄────────────────────────────────────── │
      │                                         │
      │  3. Send HTTP request:                  │
      │     GET /books/42 HTTP/1.1              │
      │     Host: localhost:8000                │
      │     Accept: application/json            │
      │  ──────────────────────────────────────► │
      │                                         │
      │         4. Server processes request:     │
      │            - Parse HTTP headers          │
      │            - Route /books/42 → handler   │
      │            - Handler queries database    │
      │            - Build JSON response         │
      │                                         │
      │  5. Send HTTP response:                 │
      │     HTTP/1.1 200 OK                     │
      │     Content-Type: application/json      │
      │     {"id": 42, "title": "Dune"}         │
      │  ◄────────────────────────────────────── │
      │                                         │
      │  6. TCP close (or keep-alive)           │
      │  ──────────────────────────────────────► │

Steps 2, 3, 5, and 6 are pure TCP—we covered those in Lecture 3. Step 1 is DNS, which we won’t dive into. Step 4 is where all the interesting web server logic lives.

Two Separate Concerns

Notice that step 4 has two very different kinds of work:

  1. Network plumbing: accepting TCP connections, reading raw bytes, parsing the HTTP request line and headers, sending the response bytes back over the socket, managing timeouts, handling keep-alive connections.

  2. Application logic: looking at the URL and method, deciding which Python function should handle it, running that function (which might query a database, validate input, compute a result), and building the response body.

These are fundamentally different responsibilities. The network plumbing is the same for every web application—whether you’re building a bookstore API, a social network, or a weather service. The application logic is what makes your app yours.

This separation is the key insight that leads to WSGI.

A Threaded HTTP Server (Sketch)

Let’s sketch what a simple threaded web server looks like, building on our Lecture 3 knowledge. This isn’t production code—it’s a mental model.

# Conceptual sketch — not a real server, but shows the structure
import socket
import threading

def handle_client(conn, addr):
    """Handle one HTTP request from one client."""
    # 1. Read the raw HTTP request
    raw_request = conn.recv(4096).decode()

    # 2. Parse it (in reality, this is much more complex)
    request_line = raw_request.split("\r\n")[0]
    method, path, _ = request_line.split(" ")
    
    # 3. Route to the right handler (application logic)
    if method == "GET" and path == "/":
        status = "200 OK"
        body = "<h1>Welcome!</h1>"
    elif method == "GET" and path.startswith("/books/"):
        book_id = path.split("/")[-1]
        status = "200 OK"
        body = f'{{"id": {book_id}, "title": "Some Book"}}'
    else:
        status = "404 Not Found"
        body = '{"error": "Not found"}'

    # 4. Build and send the HTTP response
    response = (
        f"HTTP/1.1 {status}\r\n"
        f"Content-Type: application/json\r\n"
        f"Content-Length: {len(body)}\r\n"
        f"\r\n"
        f"{body}"
    )
    conn.send(response.encode())
    conn.close()

def run_server(host="127.0.0.1", port=8000):
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind((host, port))
    server.listen(5)
    print(f"Serving on http://{host}:{port} ...")

    while True:
        conn, addr = server.accept()
        t = threading.Thread(target=handle_client, args=(conn, addr))
        t.start()

if __name__ == "__main__":
    run_server()

This is essentially the threaded echo server from Lecture 3, but with HTTP parsing and routing bolted on. The main thread sits in accept(), and each connection gets its own thread.

The problem? Everything is tangled together. The network code (socket handling, HTTP parsing) and the application code (routing, generating responses) are all in handle_client. If you want to:

  • Switch from threads to processes → you need to rewrite the server loop.
  • Add URL pattern matching → you need to modify handle_client.
  • Use a different web server (say, one that handles keep-alive properly) → you need to rewrite everything.

This is exactly the problem the web development world faced in the early 2000s. Every Python web framework came with its own server, and they were all incompatible. You couldn’t plug a Django app into a CherryPy server, or vice versa.

WSGI: Separating Server from Application

The Problem

In the early days of Python web development, the landscape was fragmented:

  • Zope had its own server.
  • CherryPy had its own server.
  • Twisted had its own server.
  • mod_python tied everything to Apache.

If you wanted to deploy a CherryPy app on a different server, tough luck. If a new, faster server came along, every framework had to write an adapter for it individually. It was a mess of N \times M combinations.

The Solution: PEP 3333

In 2003 (updated in 2010 as PEP 3333 for Python 3), the Python community defined WSGI: the Web Server Gateway Interface. It’s a simple contract between two parties:

  • The server (also called the “gateway”) handles all the network plumbing.
  • The application (also called the “framework”) handles the business logic.

The entire interface is this:

def application(environ, start_response):
    # environ: a dict with request data (method, path, headers, body, ...)
    # start_response: a callback to set the status and response headers
    
    start_response("200 OK", [("Content-Type", "text/plain")])
    return [b"Hello, World!"]

That’s it. A WSGI application is a callable (a function or an object with __call__) that takes two arguments and returns an iterable of byte strings. The server calls this function for every request, passing in the parsed request data and a callback for setting headers.

This turned the N \times M problem into N + M: any WSGI-compliant server can run any WSGI-compliant application. Write your server once, write your framework once, and they just work together.

A Minimal WSGI Application

Let’s write the simplest possible WSGI app and run it with Python’s built-in wsgiref.simple_server:

# minimal_wsgi.py
from wsgiref.simple_server import make_server

def application(environ, start_response):
    # Extract request info from environ
    method = environ["REQUEST_METHOD"]
    path = environ["PATH_INFO"]
    
    # Simple routing
    if path == "/" and method == "GET":
        status = "200 OK"
        body = b"Welcome to the minimal WSGI app!"
    elif path == "/hello" and method == "GET":
        status = "200 OK"
        body = b"Hello from WSGI!"
    else:
        status = "404 Not Found"
        body = b"Not found."

    headers = [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(body))),
    ]
    start_response(status, headers)
    return [body]

if __name__ == "__main__":
    server = make_server("127.0.0.1", 8000, application)
    print("WSGI server on http://127.0.0.1:8000 ...")
    server.serve_forever()

Run it and test with curl:

python minimal_wsgi.py

# In another CMD window:
curl http://127.0.0.1:8000/
curl http://127.0.0.1:8000/hello
curl http://127.0.0.1:8000/nope

What’s in environ?

The environ dictionary is the heart of WSGI. It contains everything the server knows about the request, plus some CGI-inherited variables:

Key Example Meaning
REQUEST_METHOD "GET" The HTTP method
PATH_INFO "/books/42" The URL path
QUERY_STRING "page=2&limit=10" Everything after the ? in the URL
CONTENT_TYPE "application/json" The Content-Type header
CONTENT_LENGTH "128" The Content-Length header
HTTP_ACCEPT "application/json" The Accept header (all HTTP headers are prefixed with HTTP_)
HTTP_AUTHORIZATION "Bearer abc..." The Authorization header
SERVER_NAME "127.0.0.1" The server’s hostname
SERVER_PORT "8000" The server’s port
wsgi.input (file-like object) The request body (for POST/PUT)

Let’s write a WSGI app that dumps the entire environ so you can see what’s available:

# environ_dump.py
from wsgiref.simple_server import make_server

def application(environ, start_response):
    lines = []
    for key, value in sorted(environ.items()):
        lines.append(f"{key}: {value!r}")
    
    body = "\n".join(lines).encode()
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(body))),
    ])
    return [body]

if __name__ == "__main__":
    server = make_server("127.0.0.1", 8000, application)
    print("Environ dump on http://127.0.0.1:8000 ...")
    server.serve_forever()

Visit http://127.0.0.1:8000/some/path?key=value in your browser and you’ll see every piece of request data the server passes to your application. This is the raw material that frameworks like Flask use to build their friendly request objects.

Real WSGI Servers

wsgiref.simple_server is fine for development but terrible for production. It’s single-threaded, slow, and doesn’t handle edge cases well. Real WSGI servers include:

Server Notes
Gunicorn The most popular choice on Linux. Pre-fork worker model. Doesn’t run on Windows natively.
Waitress Pure Python, works on Windows. Great for development and light production.
uWSGI High-performance, many features, complex configuration.

Installing and using Waitress (since we’re on Windows):

pip install waitress
# Serve our minimal app with Waitress instead of wsgiref
# waitress_example.py
from minimal_wsgi import application

if __name__ == "__main__":
    from waitress import serve
    print("Waitress serving on http://127.0.0.1:8000 ...")
    serve(application, host="127.0.0.1", port=8000)

Or from the command line:

waitress-serve --host 127.0.0.1 --port 8000 minimal_wsgi:application

Same application, different server. That’s the power of WSGI.

Flask Is a WSGI Application

Here’s the punchline for students who already know Flask from the Framework Python section: Flask is just a WSGI application. When you write:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def index():
    return "Hello from Flask!"

@app.route("/books/<int:book_id>")
def get_book(book_id):
    return {"id": book_id, "title": "Some Book"}

The app object is a callable that conforms to the WSGI interface. Internally, app.__call__(environ, start_response) does all the work you’d otherwise do by hand: parsing environ, matching URL patterns to your decorated functions, serializing your return values to HTTP responses, and calling start_response with the right headers.

You can verify this yourself:

from flask import Flask

app = Flask(__name__)

# Flask's WSGI callable is app.wsgi_app
print(type(app.wsgi_app))  # <class 'method'>
print(callable(app))        # True — app itself is also callable

When you run flask run or app.run(), Flask uses its built-in development server (based on Werkzeug, which is itself built on wsgiref). In production, you’d use Waitress or Gunicorn instead:

# Development (Flask's built-in server — single-threaded, debug mode)
flask run

# Production (Waitress — multithreaded, no debug)
waitress-serve --host 0.0.0.0 --port 8000 myapp:app

The application code stays exactly the same. Only the server changes. That’s WSGI doing its job.

The Layered Architecture

Let’s zoom out and see the full picture:

┌─────────────────────────────────────────────────────┐
│  Client (browser, curl, requests, mobile app, ...)  │
└──────────────────────┬──────────────────────────────┘
                       │  HTTP over TCP
┌──────────────────────▼──────────────────────────────┐
│  WSGI Server (Waitress, Gunicorn, uWSGI)            │
│  • Accepts TCP connections                          │
│  • Parses HTTP requests → builds environ dict       │
│  • Calls application(environ, start_response)       │
│  • Sends HTTP response bytes back to client         │
│  • Manages threads/workers for concurrency          │
└──────────────────────┬──────────────────────────────┘
                       │  WSGI interface
┌──────────────────────▼──────────────────────────────┐
│  WSGI Application (Flask, Django, your raw function)│
│  • Reads environ to understand the request          │
│  • Routes to the right handler                      │
│  • Runs business logic (DB queries, validation)     │
│  • Returns response body as bytes                   │
└─────────────────────────────────────────────────────┘

This is the same layered system that REST’s fifth constraint describes. The client doesn’t know (or care) whether the server is Waitress or Gunicorn. The application doesn’t know (or care) which server is calling it. Each layer only talks to its neighbor through a defined interface.

WSGI Is Synchronous

One important limitation: WSGI is inherently synchronous. The server calls application(environ, start_response) and blocks until it returns. Each request ties up a thread (or process) for its entire duration. This is fine for most applications, but it becomes a problem when you have many slow clients or long-lived connections (like WebSockets).

This is the exact scalability limitation we flagged in Lecture 3 with the thread-per-client model. The solution—ASGI and async programming—is the topic of Lecture 5.

Putting It All Together: A Flask Bookstore API

Let’s build a small but complete API that demonstrates everything from this lecture: resource-oriented URLs, proper HTTP methods, meaningful status codes, and JSON request/response bodies. We’ll use Flask since you already know it from the Framework Python section—but now you understand what Flask is under the hood.

The Application

# bookstore.py
from flask import Flask, jsonify, request, abort

app = Flask(__name__)

# In-memory "database" — a dict keyed by book ID
books = {
    1: {"id": 1, "title": "Dune", "author": "Frank Herbert", "year": 1965},
    2: {"id": 2, "title": "Neuromancer", "author": "William Gibson", "year": 1984},
    3: {"id": 3, "title": "Snow Crash", "author": "Neal Stephenson", "year": 1992},
}
next_id = 4


@app.route("/books", methods=["GET"])
def list_books():
    """GET /books — return all books, with optional filtering."""
    genre = request.args.get("author")  # query parameter
    if genre:
        filtered = [b for b in books.values() if b["author"] == genre]
        return jsonify(filtered)
    return jsonify(list(books.values()))


@app.route("/books/<int:book_id>", methods=["GET"])
def get_book(book_id):
    """GET /books/:id — return a single book."""
    book = books.get(book_id)
    if book is None:
        abort(404)
    return jsonify(book)


@app.route("/books", methods=["POST"])
def create_book():
    """POST /books — create a new book from JSON body."""
    global next_id
    data = request.get_json()
    if not data or "title" not in data or "author" not in data:
        return jsonify({"error": "Missing 'title' or 'author'"}), 400

    book = {
        "id": next_id,
        "title": data["title"],
        "author": data["author"],
        "year": data.get("year"),
    }
    books[next_id] = book
    next_id += 1

    return jsonify(book), 201, {"Location": f"/books/{book['id']}"}


@app.route("/books/<int:book_id>", methods=["PUT"])
def replace_book(book_id):
    """PUT /books/:id — replace a book entirely."""
    if book_id not in books:
        abort(404)
    data = request.get_json()
    if not data or "title" not in data or "author" not in data:
        return jsonify({"error": "Missing 'title' or 'author'"}), 400

    book = {
        "id": book_id,
        "title": data["title"],
        "author": data["author"],
        "year": data.get("year"),
    }
    books[book_id] = book
    return jsonify(book)


@app.route("/books/<int:book_id>", methods=["DELETE"])
def delete_book(book_id):
    """DELETE /books/:id — remove a book."""
    if book_id not in books:
        abort(404)
    del books[book_id]
    return "", 204


if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8000, debug=True)

Notice how each endpoint maps to the REST patterns we discussed:

Endpoint Method Status Richardson Level 2
/books GET 200 List the collection
/books/42 GET 200 / 404 Read a single resource
/books POST 201 + Location Create a new resource
/books/42 PUT 200 / 404 Replace a resource
/books/42 DELETE 204 / 404 Delete a resource

Running It

Start the Flask development server:

python bookstore.py

Or with Waitress for something closer to production:

pip install waitress
waitress-serve --host 127.0.0.1 --port 8000 bookstore:app

Testing with curl

Open another CMD window and try these commands:

:: List all books
curl http://127.0.0.1:8000/books

:: Get a specific book
curl http://127.0.0.1:8000/books/1

:: Get a book that doesn't exist (expect 404)
curl -v http://127.0.0.1:8000/books/999

:: Create a new book
curl -X POST http://127.0.0.1:8000/books -H "Content-Type: application/json" -d "{\"title\": \"The Hitchhiker's Guide\", \"author\": \"Douglas Adams\", \"year\": 1979}"

:: Replace a book
curl -X PUT http://127.0.0.1:8000/books/1 -H "Content-Type: application/json" -d "{\"title\": \"Dune (Revised)\", \"author\": \"Frank Herbert\", \"year\": 1965}"

:: Delete a book
curl -X DELETE http://127.0.0.1:8000/books/3

:: Verify it's gone
curl http://127.0.0.1:8000/books
curl Quick Reference
Flag Meaning
-X POST Set the HTTP method (default is GET)
-H "..." Add a header
-d "..." Send a request body (implies POST if no -X)
-v Verbose — show request and response headers
-i Show response headers along with body

Testing with Python’s requests

If you prefer Python over curl (and who wouldn’t?), the requests library is the standard tool for making HTTP requests:

pip install requests
import requests

BASE = "http://127.0.0.1:8000"

# List all books
response = requests.get(f"{BASE}/books")
print(response.status_code)  # 200
print(response.json())       # list of book dicts

# Get one book
response = requests.get(f"{BASE}/books/1")
print(response.json())       # {"id": 1, "title": "Dune", ...}

# Create a book
response = requests.post(
    f"{BASE}/books",
    json={"title": "Foundation", "author": "Isaac Asimov", "year": 1951},
)
print(response.status_code)             # 201
print(response.headers["Location"])     # /books/4
print(response.json())                  # the new book

# Delete a book
response = requests.delete(f"{BASE}/books/2")
print(response.status_code)  # 204

# Try to get the deleted book
response = requests.get(f"{BASE}/books/2")
print(response.status_code)  # 404

Notice how requests mirrors the HTTP concepts we’ve discussed:

  • requests.get() / .post() / .put() / .delete() → HTTP methods.
  • json=... → sets the body and the Content-Type: application/json header automatically.
  • response.status_code → the status code.
  • response.json() → parses the JSON body.
  • response.headers → a dict-like object with response headers.

Under the hood, requests opens a TCP socket, sends a formatted HTTP request (exactly like we did by hand in Lecture 3), reads the response, and wraps it in a convenient Python object. No magic—just layers of abstraction.

Connecting It All Back

This small Flask app ties together everything from the lecture series:

  • Lecture 1: Flask uses threads to handle concurrent requests (via Werkzeug’s threaded server, or Waitress’s thread pool).
  • Lecture 3: Under the hood, it’s TCP sockets exchanging HTTP-formatted bytes.
  • This lecture: The API follows REST conventions (Level 2), uses proper HTTP methods and status codes, and the WSGI interface lets us swap the server without changing the application.

In Lecture 5, we’ll see how FastAPI + Uvicorn achieves the same thing but with async/await, enabling much higher concurrency for IO-bound workloads.

Summary

We’ve covered a lot of conceptual ground. Here’s the cheat sheet:

Concept What It Is Key Insight
Client-Server One provides, many consume Server owns the data; clients come and go
RPC Call functions on remote machines Convenient but hides network realities
REST Architectural style (6 constraints) Resources, representations, statelessness, HATEOAS
Richardson Model Maturity levels 0–3 Most APIs are Level 2; that’s usually fine
HTTP Methods GET, POST, PUT, PATCH, DELETE Map to CRUD; safety and idempotency matter
Status Codes 2xx/3xx/4xx/5xx Be specific: 201, 204, 404, 422 — not just 200 and 500
WSGI Server ↔︎ Application interface application(environ, start_response) — decouples server from framework
Flask A WSGI application Routes + decorators + environ parsing, all behind a friendly API

The story arc from Lectures 1–4:

Raw sockets (L3)
    → HTTP is just text over TCP (L3)
        → Client-server architecture (L4)
            → RPC: function calls over the network (L4)
            → REST: resources over the network (L4)
                → WSGI: decouple server from app (L4)
                    → Flask/Django: friendly wrappers around WSGI (L4)

And the thread of concurrency:

Processes and threads (L1)
    → Thread-per-client servers (L3)
        → Threaded WSGI servers like Waitress (L4)
            → But threads don't scale to 10k connections...
                → Async programming and ASGI (L5)

Exercises & Project Ideas

Exercise 1: Raw WSGI JSON API

Build a WSGI application (no Flask!) that serves a simple JSON API for a to-do list:

  • GET /todos → list all to-dos
  • POST /todos → create a to-do (read JSON from wsgi.input)
  • DELETE /todos/<id> → delete a to-do

You’ll need to:

  • Parse PATH_INFO manually to extract the ID.
  • Read the request body from environ["wsgi.input"] (it’s a file-like object; use environ.get("CONTENT_LENGTH") to know how many bytes to read).
  • Use json.dumps() and json.loads() for serialization.
  • Return proper status codes (200, 201, 404, 405).

Run it with wsgiref.simple_server. Then try running the exact same application with Waitress. The application code shouldn’t change at all.

This exercise makes you appreciate what Flask does for you—and understand that it’s all just environ parsing at the end of the day.

Exercise 2: Explore a Public API

Use requests to interact with a public API. Some good candidates:

  • httpbin.org — an echo service that returns details about your request. Great for testing.
  • JSONPlaceholder — a fake REST API for testing and prototyping.
  • GitHub API — a real production API with partial HATEOAS.

Tasks:

  1. Make GET, POST, PUT, and DELETE requests.
  2. Inspect the response status codes and headers.
  3. For the GitHub API: follow the url links in responses to navigate between resources. How does this compare to HATEOAS?
  4. Classify each API on the Richardson Maturity Model.
import requests

# Example: httpbin echoes your request back
r = requests.post(
    "https://httpbin.org/post",
    json={"message": "Hello!"},
    headers={"X-Custom-Header": "test"},
)
print(r.json())  # See your request echoed back
Exercise 3: Flask Dev Server vs Waitress

Write a Flask app with a single endpoint that simulates a slow database query:

import time
from flask import Flask

app = Flask(__name__)

@app.route("/slow")
def slow():
    time.sleep(2)  # Simulate slow DB query
    return {"status": "done"}
  1. Run it with python app.py (Flask’s dev server, single-threaded by default).
  2. In another terminal, send 5 concurrent requests using a script:
import requests
import threading
import time

def make_request(i):
    t0 = time.perf_counter()
    r = requests.get("http://127.0.0.1:8000/slow")
    print(f"Request {i}: {time.perf_counter() - t0:.2f}s")

threads = [threading.Thread(target=make_request, args=(i,)) for i in range(5)]
t0 = time.perf_counter()
for t in threads:
    t.start()
for t in threads:
    t.join()
print(f"Total: {time.perf_counter() - t0:.2f}s")
  1. Now run the same app with Waitress (waitress-serve --threads 4 app:app). Repeat the concurrent test. What’s the difference in total time? Why?

This connects directly to the threading discussion from Lectures 1 and 2: Waitress’s thread pool lets multiple requests be handled concurrently (each sleeping thread releases the GIL), while Flask’s single-threaded dev server processes them sequentially.

Project Idea: Bookstore API with Persistent Storage

Extend the bookstore example from this lecture:

  1. Replace the in-memory dict with an SQLite database (use sqlite3 from the standard library).
  2. Add a GET /books?author=...&year_min=...&year_max=... with filtering via query parameters.
  3. Add pagination: GET /books?page=2&per_page=10 should return a page of results plus a total count.
  4. Add basic HATEOAS: include _links in each book response with links to self, collection, and author.
  5. Deploy with Waitress instead of Flask’s dev server.
  6. Write a Python client script using requests that exercises every endpoint.

This is a great portfolio-ready project that demonstrates understanding of REST, HTTP, WSGI, and Flask.

Additional Resources


Next: Lecture 5 — Async Programming, Event Loops, and ASGI, where we tackle the thread-per-client scalability wall. We’ll learn async/await, understand event loops, and see how Uvicorn + FastAPI replace the WSGI stack with something that handles thousands of concurrent connections on a single thread.