# rpc_server.py
from xmlrpc.server import SimpleXMLRPCServer
def add(x, y):
return x + y
def multiply(x, y):
return x * y
def greeting(name):
return f"Hello, {name}! Greetings from the server."
server = SimpleXMLRPCServer(("127.0.0.1", 8000))
server.register_function(add)
server.register_function(multiply)
server.register_function(greeting)
print("RPC server listening on http://127.0.0.1:8000 ...")
server.serve_forever()Client-Server Architectures and RESTful APIs
What REST actually means (and why most ‘REST APIs’ aren’t)
Introduction
In the first three lectures we worked our way up from the bottom of the stack: processes, threads, synchronization primitives, queues, sockets, TCP, and even a bare-bones HTTP server built from raw sockets. We know how bytes get from point A to point B.
Now it’s time to zoom out and think about architecture. This lecture is heavier on ideas than on code—but the ideas are what make the code make sense. By the end you’ll understand:
- What the client-server model is and why it dominates the web.
- What RPC is and where it falls short.
- What REST actually means (spoiler: not what most job postings think it means).
- How a threaded web server works under the hood.
- How the WSGI protocol separates the server from your Flask/Django application.
This is the fourth lecture in a five-part series:
- Processes and Threads
- Multiprocessing and Multithreading in Practice
- Interprocess Communication and Sockets
- Client-Server Architectures and RESTful APIs (you are here)
- Async Programming, Event Loops, and ASGI
The Client-Server Model
The Basic Idea
A client-server architecture is deceptively simple: one process (the server) provides a service, and one or more other processes (the clients) consume that service. The server sits around waiting for requests; the client initiates contact when it needs something.
You’ve already built one. The chat server from Lecture 3 was a textbook client-server application:
- The server was long-lived—it started up and waited indefinitely for connections.
- Clients came and went, connecting and disconnecting at will.
- The server was the authority: it managed the shared client list and decided who received which messages.
This pattern is everywhere. When you open a website, your browser (client) talks to a web server. When you send a Slack message, the Slack app (client) talks to Slack’s servers. When git push sends your code to GitHub, your git client talks to GitHub’s server.
Client-Server vs Peer-to-Peer
The main alternative is peer-to-peer (P2P), where every participant is both a client and a server simultaneously. BitTorrent is the classic example: every machine downloading a file is also uploading pieces to others.
| Client-Server | Peer-to-Peer | |
|---|---|---|
| Authority | Server is the single source of truth | No central authority |
| Scalability | Server can become a bottleneck | Scales naturally with more peers |
| Simplicity | Simple mental model | Complex coordination |
| Examples | The web, email, databases | BitTorrent, blockchain, some chat protocols |
For web APIs—our focus—client-server is the universal choice. The server owns the data and the business logic; clients just ask for things and display results.
Thin Clients vs Thick Clients
Not all clients are equal. Think of a spectrum:
- Thin client: does almost nothing; the server does all the work. Example: a 1990s terminal displaying text from a mainframe. Or a server-rendered web page where the server produces the full HTML.
- Thick client: does significant processing locally. Example: a modern single-page application (React, Vue) that fetches JSON from the server and renders everything client-side. Or a mobile app with offline capabilities.
Most modern web applications fall somewhere in between. The server provides data (usually as JSON), and the client (a browser running JavaScript, or a mobile app) handles presentation and some logic. Understanding this split matters because it influences API design: a thick client needs a fine-grained, data-oriented API; a thin client might prefer the server to do more work per request.
Remote Procedure Calls (RPC)
The Idea
Once you have a client and a server, the next question is: how do they talk? The most intuitive answer is RPC: Remote Procedure Call. The idea is seductive in its simplicity—call a function, but on someone else’s computer.
Your code looks like a normal function call:
result = remote_server.add(3, 4)But under the hood, this serializes the arguments, sends them over the network, the server deserializes and executes the function, serializes the result, sends it back, and the client deserializes it. All the network plumbing is hidden behind what looks like a local function call.
A Quick Demo with xmlrpc
Python ships with a batteries-included RPC implementation: xmlrpc. It’s ancient (uses XML over HTTP), but it demonstrates the concept perfectly with minimal code.
rpc_server.py — exposes Python functions over the network:
rpc_client.py — calls those functions as if they were local:
# rpc_client.py
from xmlrpc.client import ServerProxy
server = ServerProxy("http://127.0.0.1:8000")
print(server.add(3, 4)) # 7
print(server.multiply(6, 7)) # 42
print(server.greeting("Alice")) # Hello, Alice! Greetings from the server.Run the server in one CMD window, the client in another:
# Window 1
python rpc_server.py
# Window 2
python rpc_client.pyLook at the client code. It reads like normal Python. server.add(3, 4) looks like calling a local method—but it’s actually sending an HTTP request with an XML payload to another process. The ServerProxy object intercepts attribute access and turns it into network calls. Neat.
If you’re curious what the wire format looks like, add verbose=True to the ServerProxy:
server = ServerProxy("http://127.0.0.1:8000", verbose=True)You’ll see the raw XML request and response printed to the console. It’s… not pretty. This is why XML-RPC fell out of fashion—JSON is much more readable. But the concept of RPC lives on in modern systems like gRPC (Google’s protocol-buffer-based RPC framework).
The Problem with RPC
RPC is elegant, but it has a fundamental flaw: it pretends the network doesn’t exist.
A local function call is:
- Fast — nanoseconds.
- Reliable — if the function exists, it will run.
- Atomic — it either completes or throws an exception.
A network call is:
- Slow — milliseconds at best, seconds at worst.
- Unreliable — the server might be down, the network might be congested, packets might be lost.
- Ambiguous — if you don’t get a response, did the server process your request or not? (Did
transfer_money(1000)execute? Do you retry and risk a double transfer?)
By making network calls look like local calls, RPC lures you into ignoring these realities. You write code that works fine on your laptop and breaks catastrophically in production when latency spikes or a server restarts mid-request.
This critique—articulated in the famous 1994 paper A Note on Distributed Computing by Waldo et al.—was one of the motivations for a different approach: REST.
Despite the critique, RPC is alive and well. gRPC (used heavily at Google, Netflix, and many microservice architectures) is a modern, high-performance RPC framework that uses Protocol Buffers for serialization. It’s explicit about the network (with features like deadlines, cancellation, and streaming) and doesn’t pretend calls are local.
The lesson isn’t “don’t use RPC.” It’s “understand the tradeoffs.”
REST: The Idea
Origin Story
In 2000, Roy Fielding—one of the principal authors of the HTTP specification—published his PhD dissertation. Chapter 5 introduced REST: Representational State Transfer. It wasn’t a protocol, a library, or a specification. It was an architectural style—a set of constraints that, when applied together, produce systems with desirable properties like scalability, simplicity, and evolvability.
Fielding was describing the architecture of the web itself. The web already worked this way; he was just naming and formalizing the principles that made it work so well. REST is not something you install. It’s a set of design decisions.
The Six Constraints
REST defines six constraints. An API that satisfies all of them is truly RESTful. Let’s walk through each one.
1. Client-Server
The client and server are separate concerns. The server doesn’t know about the UI; the client doesn’t know about the database. They communicate only through a defined interface (HTTP, in practice). This separation allows them to evolve independently—you can redesign your entire frontend without touching the server, and vice versa.
We’ve been living this constraint since the beginning of this lecture.
2. Statelessness
Each request from the client must contain all the information the server needs to process it. The server doesn’t remember previous requests. There’s no “session” on the server side that tracks where you are in a multi-step process.
This sounds restrictive, but it’s liberating for scalability. If the server holds no per-client state, any server in a cluster can handle any request. You can add more servers behind a load balancer without worrying about sticky sessions or shared state.
Good question. In practice, authentication tokens (like JWTs or API keys) are sent with every request in a header. The server validates the token each time—it doesn’t “remember” that you logged in. The client holds the state (the token); the server just verifies it. This is statelessness in action.
3. Cacheability
Responses must declare whether they’re cacheable or not. If a response says “this data is valid for the next 5 minutes,” the client (or an intermediary like a CDN) can reuse it without hitting the server again. This reduces load and improves performance dramatically.
HTTP has a rich caching system built in: Cache-Control, ETag, Last-Modified, Expires headers. We’ll touch on these later in this lecture.
4. Uniform Interface
This is the big one—the constraint that gives REST its distinctive flavor. It has four sub-constraints:
a) Resources identified by URIs. Everything the API exposes is a resource, and each resource has a unique address (URI). A book, a user, a list of orders—each gets its own URL.
GET /books/42 ← The book with ID 42
GET /users/alice ← The user "alice"
GET /orders ← The collection of all orders
b) Manipulation through representations. You don’t interact with the resource directly—you interact with representations of it. When you GET /books/42, you don’t get the database row. You get a JSON (or HTML, or XML) representation of that book. When you PUT /books/42, you send a representation of what the book should look like, and the server updates its internal state accordingly.
c) Self-descriptive messages. Every message (request or response) contains enough information to understand itself. The Content-Type header says “this body is JSON.” The Allow header says “you can GET or DELETE this resource.” You don’t need out-of-band documentation to parse a single message.
d) HATEOAS — Hypermedia As The Engine Of Application State. This is the most ignored and most misunderstood constraint. The idea: the server’s responses should contain links that tell the client what it can do next.
Think about how you browse a website. You go to the homepage, and it has links to “Products,” “About,” “Contact.” You don’t need a manual telling you which URLs exist—the pages themselves guide you. That’s HATEOAS.
Applied to an API, a response might look like:
{
"id": 42,
"title": "The Pragmatic Programmer",
"author": "Hunt & Thomas",
"links": {
"self": "/books/42",
"author": "/authors/7",
"reviews": "/books/42/reviews",
"delete": "/books/42"
}
}The client doesn’t hardcode URLs. It follows links from responses. If the server changes its URL structure, clients adapt automatically—just like your browser adapts when a website redesigns its navigation.
5. Layered System
The client doesn’t need to know whether it’s talking directly to the server, or to a load balancer, or to a caching proxy in front of the server. Each layer only knows about its immediate neighbor. This allows you to insert reverse proxies, CDNs, API gateways, and other infrastructure without changing clients or servers.
6. Code on Demand (Optional)
The server can send executable code to the client. JavaScript in web pages is the prime example: the server sends HTML containing <script> tags, and the browser executes the code. This is the only optional constraint—most APIs don’t use it.
REST doesn’t mandate HTTP. It doesn’t mandate JSON. It doesn’t mention status codes or URL patterns. REST is a set of architectural constraints that happen to map naturally onto HTTP—because Fielding designed them by analyzing the web, which runs on HTTP.
When someone says “REST API,” what they usually mean is “HTTP API with JSON responses and resource-oriented URLs.” That’s fine as a shorthand, but it’s not the full picture. Let’s unpack the gap.
What Most People Call “REST” (and Why It’s Not)
The Colloquial Meaning
In everyday developer conversation, “REST API” means something like:
- Uses HTTP.
- Has “nice” URLs like
/users/42instead of/getUser?id=42. - Sends and receives JSON.
- Uses HTTP methods (GET, POST, PUT, DELETE) somewhat appropriately.
This is a perfectly reasonable way to build an API. But it’s not REST in the Fielding sense—it’s missing statelessness enforcement, cacheability, self-descriptive messages, and almost certainly HATEOAS. It’s more accurately called an HTTP API or a resource-oriented API.
Does the distinction matter in practice? Sometimes. Understanding real REST helps you design better APIs, ask better questions in architecture discussions, and avoid cargo-culting patterns without understanding why they exist.
The Richardson Maturity Model
Leonard Richardson proposed a handy model for classifying HTTP APIs on a scale from 0 to 3. Think of it as a ladder toward full REST.
Level 0: The Swamp of POX
“POX” = Plain Old XML (or JSON). One URL, one HTTP method (usually POST), and the operation is encoded in the request body.
POST /api
{"action": "getBook", "id": 42}
POST /api
{"action": "deleteBook", "id": 42}
This is basically RPC tunneled through HTTP. The URL and HTTP method carry no meaning—everything is in the body. Many SOAP services live here, and so does our xmlrpc example from earlier.
Level 1: Resources
Each “thing” gets its own URL, but you still use a single HTTP method (usually POST) for everything.
POST /books/42
{"action": "get"}
POST /books/42
{"action": "delete"}
Better—at least the URL tells you what you’re talking about. But the HTTP method doesn’t tell you what you’re doing to it.
Level 2: HTTP Verbs
Now we use HTTP methods meaningfully:
GET /books/42 ← Read the book
POST /books ← Create a new book
PUT /books/42 ← Replace the book
DELETE /books/42 ← Delete the book
This is where the vast majority of production APIs live. It’s clean, intuitive, and well-supported by tools and frameworks. Most developers call this “RESTful” and stop here.
Level 3: Hypermedia Controls (HATEOAS)
The response includes links that tell the client what to do next:
{
"id": 42,
"title": "The Pragmatic Programmer",
"author": "Hunt & Thomas",
"_links": {
"self": {"href": "/books/42"},
"update": {"href": "/books/42", "method": "PUT"},
"delete": {"href": "/books/42", "method": "DELETE"},
"reviews": {"href": "/books/42/reviews"},
"collection": {"href": "/books"}
}
}The client discovers the API by following links, just like a human browsing a website. This is true REST. And almost nobody does it for JSON APIs, because:
- There’s no universally adopted standard for hypermedia in JSON (HAL, JSON-LD, and JSON:API exist but none dominates).
- Most API consumers are controlled by the same team that controls the server, so hardcoding URLs is easier.
- The tooling support isn’t there yet—OpenAPI/Swagger, the dominant API documentation format, doesn’t model HATEOAS well.
Level 2 is the sweet spot for most teams. Use resources, use HTTP verbs correctly, return proper status codes, and document your API well. That gets you 90% of REST’s benefits.
But know that Level 3 exists. For public APIs consumed by many independent clients, HATEOAS is genuinely valuable—it lets you evolve your API without breaking clients that follow links instead of hardcoding URLs.
Side by Side: RPC-Style vs REST-Style
Let’s compare two API designs for the same problem—managing a library of books. Same functionality, different philosophies.
RPC-Style (Level 0–1):
POST /api/getBook {"id": 42}
POST /api/listBooks {"genre": "fiction", "page": 1}
POST /api/createBook {"title": "Dune", "author": "Herbert"}
POST /api/updateBook {"id": 42, "title": "Dune (revised)"}
POST /api/deleteBook {"id": 42}
POST /api/searchBooks {"query": "Python"}
Every operation is a POST to a unique “action” endpoint. The URL is a verb describing what to do. This is natural if you’re thinking in terms of function calls.
REST-Style (Level 2):
GET /books?genre=fiction&page=1
GET /books/42
POST /books {"title": "Dune", "author": "Herbert"}
PUT /books/42 {"title": "Dune (revised)", "author": "Herbert"}
DELETE /books/42
GET /books?q=Python
The URL is a noun (the resource). The HTTP method is the verb (what you’re doing to it). Filtering and searching are query parameters on the collection URL.
Neither is “wrong.” But the REST-style version is:
- More predictable — once you know the resource URL, you can guess the CRUD operations.
- More cacheable — GET requests can be cached; POST requests generally can’t. The RPC-style version makes everything a POST, defeating caching entirely.
- More aligned with HTTP — proxies, load balancers, and browsers understand GET vs POST. A proxy can cache a GET response or retry it safely. It can’t do that with a POST.
HTTP Methods, Status Codes, and Headers
Now that we understand the philosophy, let’s get practical. If you’re building a Level 2 API (and you probably are), you need to know the HTTP toolkit inside out.
HTTP Methods (Verbs)
HTTP defines several methods. Five of them map cleanly to CRUD operations:
| Method | CRUD | Meaning | Request Body? | Response Body? |
|---|---|---|---|---|
GET |
Read | Retrieve a resource | No | Yes |
POST |
Create | Create a new resource | Yes | Usually |
PUT |
Update (full) | Replace a resource entirely | Yes | Optional |
PATCH |
Update (partial) | Modify part of a resource | Yes | Optional |
DELETE |
Delete | Remove a resource | Rarely | Optional |
A few others you’ll encounter:
HEAD— identical to GET, but the server returns only headers (no body). Useful for checking if a resource exists or getting its metadata without downloading the whole thing.OPTIONS— asks the server what methods are allowed for a given URL. Used heavily in CORS (Cross-Origin Resource Sharing) preflight requests by browsers.
Safety and Idempotency
Two properties that matter more than you’d think:
Safe methods don’t change anything on the server. GET and HEAD are safe—calling them a million times has no side effects. This is why search engines can crawl the web without breaking things: they only send GET requests.
Idempotent methods produce the same result whether you call them once or ten times. GET, PUT, and DELETE are idempotent. If you DELETE /books/42 twice, the second call is a no-op (the book is already gone). If you PUT /books/42 with the same body twice, the result is the same.
POST is neither safe nor idempotent. Calling POST /books twice creates two books. This is why your browser warns you about “resubmitting form data” when you refresh after a POST.
| Method | Safe? | Idempotent? |
|---|---|---|
GET |
✅ | ✅ |
HEAD |
✅ | ✅ |
POST |
❌ | ❌ |
PUT |
❌ | ✅ |
PATCH |
❌ | ❌* |
DELETE |
❌ | ✅ |
PATCH can* be idempotent (e.g., “set the title to X”) but doesn’t have to be (e.g., “append Y to the description”). It depends on the operation.
Idempotency is your best friend in unreliable networks. If your client sends a PUT and doesn’t get a response (network timeout), it can safely retry—the result will be the same. If it sends a POST and doesn’t get a response, it has a problem: did the server create the resource or not? Retrying might create a duplicate.
This is exactly the ambiguity problem we discussed in the RPC section. REST’s use of idempotent methods mitigates it.
Status Codes
Every HTTP response includes a status code—a three-digit number that summarizes what happened. They’re grouped by the first digit:
2xx — Success
The request worked.
| Code | Name | When to use |
|---|---|---|
200 |
OK | General success. GET returns data, PUT/PATCH returns updated resource. |
201 |
Created | A new resource was created (typically after POST). Include a Location header with the new resource’s URL. |
204 |
No Content | Success, but nothing to return (common for DELETE). |
3xx — Redirection
The resource moved.
| Code | Name | When to use |
|---|---|---|
301 |
Moved Permanently | The resource has a new URL. Clients should update their bookmarks. |
304 |
Not Modified | Used with caching. “Your cached version is still good, no need to re-download.” |
4xx — Client Error
The client did something wrong.
| Code | Name | When to use |
|---|---|---|
400 |
Bad Request | Malformed request (invalid JSON, missing required field, etc.). |
401 |
Unauthorized | “Who are you?” — authentication required but not provided (or invalid). |
403 |
Forbidden | “I know who you are, but you’re not allowed.” — authenticated but not authorized. |
404 |
Not Found | The resource doesn’t exist. |
405 |
Method Not Allowed | The URL exists, but not for that method (e.g., DELETE on a read-only resource). |
409 |
Conflict | The request conflicts with the current state (e.g., creating a resource that already exists). |
422 |
Unprocessable Entity | The request is well-formed but semantically wrong (e.g., age = -5). Popular with APIs using JSON validation. |
429 |
Too Many Requests | Rate limiting. “Slow down.” |
5xx — Server Error
The server messed up.
| Code | Name | When to use |
|---|---|---|
500 |
Internal Server Error | Something unexpected went wrong. The catch-all. |
502 |
Bad Gateway | A proxy or gateway received an invalid response from the upstream server. |
503 |
Service Unavailable | The server is overloaded or down for maintenance. |
If you remember nothing else: 2xx = good, 4xx = your fault, 5xx = my fault. Beyond that, use the most specific code that fits. Clients (and debugging tools) appreciate the precision.
Important Headers
We introduced headers in Lecture 3 as “the envelope” of an HTTP message. Here are the ones you’ll use most when building and consuming APIs:
Content Negotiation
Content-Type (request and response): declares the format of the body.
Content-Type: application/json
Content-Type: text/html; charset=utf-8
Content-Type: multipart/form-data
Accept (request): tells the server what formats the client can handle.
Accept: application/json
Accept: text/html, application/json;q=0.9
The q=0.9 is a quality factor—it says “I prefer HTML, but JSON is acceptable.” The server should honor this or return 406 Not Acceptable.
Authentication
Authorization (request): carries credentials.
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Bearer tokens (like JWTs) are the most common pattern in modern APIs. Basic auth (base64-encoded username:password) is simpler but less secure unless used over HTTPS.
Caching
Cache-Control (response): tells the client (and intermediary caches) how long the response is valid.
Cache-Control: max-age=3600 (valid for 1 hour)
Cache-Control: no-cache (always revalidate with the server)
Cache-Control: no-store (don't cache at all — sensitive data)
ETag (response) + If-None-Match (request): a fingerprint of the resource. The client can send the ETag back; if the resource hasn’t changed, the server returns 304 Not Modified instead of re-sending the whole body.
Other Useful Headers
| Header | Direction | Purpose |
|---|---|---|
Location |
Response | URL of a newly created resource (with 201) or a redirect target (with 301/302) |
Allow |
Response | Lists the HTTP methods a resource supports (often sent with 405) |
X-Request-Id |
Both | A unique ID for tracing a request through logs and systems |
Retry-After |
Response | How long to wait before retrying (sent with 429 or 503) |
Query Parameters vs Path Parameters vs Request Body
A common source of confusion: where does data go in a request?
Path parameters identify a specific resource:
GET /books/42 ← 42 is a path parameter (identifies the book)
GET /users/alice/orders ← alice is a path parameter (identifies the user)
Query parameters filter, sort, or paginate a collection:
GET /books?genre=fiction&sort=title&page=2
GET /users?active=true&limit=10
Request body carries the data for creation or update:
POST /books
Content-Type: application/json
{"title": "Dune", "author": "Herbert", "year": 1965}
The rule of thumb: path = which resource, query = how to filter/sort, body = what to create/update. Don’t put creation data in query parameters (POST /books?title=Dune is wrong), and don’t put resource IDs in the body when they belong in the path.
How a Threaded Web Server Works
Let’s connect the dots back to what we built in Lectures 1–3. A web server is, at its core, the threaded TCP echo server from Lecture 3—but instead of echoing bytes back, it parses HTTP requests and generates HTTP responses.
The Request Lifecycle
When you type http://localhost:8000/books/42 in your browser, here’s what happens:
┌──────────┐ ┌──────────┐
│ Browser │ │ Server │
└─────┬────┘ └─────┬────┘
│ │
│ 1. DNS lookup: localhost → 127.0.0.1 │
│ │
│ 2. TCP connect to 127.0.0.1:8000 │
│ ──────────────────────────────────────► │
│ (three-way handshake) │
│ ◄────────────────────────────────────── │
│ │
│ 3. Send HTTP request: │
│ GET /books/42 HTTP/1.1 │
│ Host: localhost:8000 │
│ Accept: application/json │
│ ──────────────────────────────────────► │
│ │
│ 4. Server processes request: │
│ - Parse HTTP headers │
│ - Route /books/42 → handler │
│ - Handler queries database │
│ - Build JSON response │
│ │
│ 5. Send HTTP response: │
│ HTTP/1.1 200 OK │
│ Content-Type: application/json │
│ {"id": 42, "title": "Dune"} │
│ ◄────────────────────────────────────── │
│ │
│ 6. TCP close (or keep-alive) │
│ ──────────────────────────────────────► │
Steps 2, 3, 5, and 6 are pure TCP—we covered those in Lecture 3. Step 1 is DNS, which we won’t dive into. Step 4 is where all the interesting web server logic lives.
Two Separate Concerns
Notice that step 4 has two very different kinds of work:
Network plumbing: accepting TCP connections, reading raw bytes, parsing the HTTP request line and headers, sending the response bytes back over the socket, managing timeouts, handling keep-alive connections.
Application logic: looking at the URL and method, deciding which Python function should handle it, running that function (which might query a database, validate input, compute a result), and building the response body.
These are fundamentally different responsibilities. The network plumbing is the same for every web application—whether you’re building a bookstore API, a social network, or a weather service. The application logic is what makes your app yours.
This separation is the key insight that leads to WSGI.
A Threaded HTTP Server (Sketch)
Let’s sketch what a simple threaded web server looks like, building on our Lecture 3 knowledge. This isn’t production code—it’s a mental model.
# Conceptual sketch — not a real server, but shows the structure
import socket
import threading
def handle_client(conn, addr):
"""Handle one HTTP request from one client."""
# 1. Read the raw HTTP request
raw_request = conn.recv(4096).decode()
# 2. Parse it (in reality, this is much more complex)
request_line = raw_request.split("\r\n")[0]
method, path, _ = request_line.split(" ")
# 3. Route to the right handler (application logic)
if method == "GET" and path == "/":
status = "200 OK"
body = "<h1>Welcome!</h1>"
elif method == "GET" and path.startswith("/books/"):
book_id = path.split("/")[-1]
status = "200 OK"
body = f'{{"id": {book_id}, "title": "Some Book"}}'
else:
status = "404 Not Found"
body = '{"error": "Not found"}'
# 4. Build and send the HTTP response
response = (
f"HTTP/1.1 {status}\r\n"
f"Content-Type: application/json\r\n"
f"Content-Length: {len(body)}\r\n"
f"\r\n"
f"{body}"
)
conn.send(response.encode())
conn.close()
def run_server(host="127.0.0.1", port=8000):
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind((host, port))
server.listen(5)
print(f"Serving on http://{host}:{port} ...")
while True:
conn, addr = server.accept()
t = threading.Thread(target=handle_client, args=(conn, addr))
t.start()
if __name__ == "__main__":
run_server()This is essentially the threaded echo server from Lecture 3, but with HTTP parsing and routing bolted on. The main thread sits in accept(), and each connection gets its own thread.
The problem? Everything is tangled together. The network code (socket handling, HTTP parsing) and the application code (routing, generating responses) are all in handle_client. If you want to:
- Switch from threads to processes → you need to rewrite the server loop.
- Add URL pattern matching → you need to modify
handle_client. - Use a different web server (say, one that handles keep-alive properly) → you need to rewrite everything.
This is exactly the problem the web development world faced in the early 2000s. Every Python web framework came with its own server, and they were all incompatible. You couldn’t plug a Django app into a CherryPy server, or vice versa.
WSGI: Separating Server from Application
The Problem
In the early days of Python web development, the landscape was fragmented:
- Zope had its own server.
- CherryPy had its own server.
- Twisted had its own server.
- mod_python tied everything to Apache.
If you wanted to deploy a CherryPy app on a different server, tough luck. If a new, faster server came along, every framework had to write an adapter for it individually. It was a mess of N \times M combinations.
The Solution: PEP 3333
In 2003 (updated in 2010 as PEP 3333 for Python 3), the Python community defined WSGI: the Web Server Gateway Interface. It’s a simple contract between two parties:
- The server (also called the “gateway”) handles all the network plumbing.
- The application (also called the “framework”) handles the business logic.
The entire interface is this:
def application(environ, start_response):
# environ: a dict with request data (method, path, headers, body, ...)
# start_response: a callback to set the status and response headers
start_response("200 OK", [("Content-Type", "text/plain")])
return [b"Hello, World!"]That’s it. A WSGI application is a callable (a function or an object with __call__) that takes two arguments and returns an iterable of byte strings. The server calls this function for every request, passing in the parsed request data and a callback for setting headers.
This turned the N \times M problem into N + M: any WSGI-compliant server can run any WSGI-compliant application. Write your server once, write your framework once, and they just work together.
A Minimal WSGI Application
Let’s write the simplest possible WSGI app and run it with Python’s built-in wsgiref.simple_server:
# minimal_wsgi.py
from wsgiref.simple_server import make_server
def application(environ, start_response):
# Extract request info from environ
method = environ["REQUEST_METHOD"]
path = environ["PATH_INFO"]
# Simple routing
if path == "/" and method == "GET":
status = "200 OK"
body = b"Welcome to the minimal WSGI app!"
elif path == "/hello" and method == "GET":
status = "200 OK"
body = b"Hello from WSGI!"
else:
status = "404 Not Found"
body = b"Not found."
headers = [
("Content-Type", "text/plain"),
("Content-Length", str(len(body))),
]
start_response(status, headers)
return [body]
if __name__ == "__main__":
server = make_server("127.0.0.1", 8000, application)
print("WSGI server on http://127.0.0.1:8000 ...")
server.serve_forever()Run it and test with curl:
python minimal_wsgi.py
# In another CMD window:
curl http://127.0.0.1:8000/
curl http://127.0.0.1:8000/hello
curl http://127.0.0.1:8000/nopeWhat’s in environ?
The environ dictionary is the heart of WSGI. It contains everything the server knows about the request, plus some CGI-inherited variables:
| Key | Example | Meaning |
|---|---|---|
REQUEST_METHOD |
"GET" |
The HTTP method |
PATH_INFO |
"/books/42" |
The URL path |
QUERY_STRING |
"page=2&limit=10" |
Everything after the ? in the URL |
CONTENT_TYPE |
"application/json" |
The Content-Type header |
CONTENT_LENGTH |
"128" |
The Content-Length header |
HTTP_ACCEPT |
"application/json" |
The Accept header (all HTTP headers are prefixed with HTTP_) |
HTTP_AUTHORIZATION |
"Bearer abc..." |
The Authorization header |
SERVER_NAME |
"127.0.0.1" |
The server’s hostname |
SERVER_PORT |
"8000" |
The server’s port |
wsgi.input |
(file-like object) | The request body (for POST/PUT) |
Let’s write a WSGI app that dumps the entire environ so you can see what’s available:
# environ_dump.py
from wsgiref.simple_server import make_server
def application(environ, start_response):
lines = []
for key, value in sorted(environ.items()):
lines.append(f"{key}: {value!r}")
body = "\n".join(lines).encode()
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(body))),
])
return [body]
if __name__ == "__main__":
server = make_server("127.0.0.1", 8000, application)
print("Environ dump on http://127.0.0.1:8000 ...")
server.serve_forever()Visit http://127.0.0.1:8000/some/path?key=value in your browser and you’ll see every piece of request data the server passes to your application. This is the raw material that frameworks like Flask use to build their friendly request objects.
Real WSGI Servers
wsgiref.simple_server is fine for development but terrible for production. It’s single-threaded, slow, and doesn’t handle edge cases well. Real WSGI servers include:
| Server | Notes |
|---|---|
| Gunicorn | The most popular choice on Linux. Pre-fork worker model. Doesn’t run on Windows natively. |
| Waitress | Pure Python, works on Windows. Great for development and light production. |
| uWSGI | High-performance, many features, complex configuration. |
Installing and using Waitress (since we’re on Windows):
pip install waitress# Serve our minimal app with Waitress instead of wsgiref
# waitress_example.py
from minimal_wsgi import application
if __name__ == "__main__":
from waitress import serve
print("Waitress serving on http://127.0.0.1:8000 ...")
serve(application, host="127.0.0.1", port=8000)Or from the command line:
waitress-serve --host 127.0.0.1 --port 8000 minimal_wsgi:applicationSame application, different server. That’s the power of WSGI.
Flask Is a WSGI Application
Here’s the punchline for students who already know Flask from the Framework Python section: Flask is just a WSGI application. When you write:
from flask import Flask
app = Flask(__name__)
@app.route("/")
def index():
return "Hello from Flask!"
@app.route("/books/<int:book_id>")
def get_book(book_id):
return {"id": book_id, "title": "Some Book"}The app object is a callable that conforms to the WSGI interface. Internally, app.__call__(environ, start_response) does all the work you’d otherwise do by hand: parsing environ, matching URL patterns to your decorated functions, serializing your return values to HTTP responses, and calling start_response with the right headers.
You can verify this yourself:
from flask import Flask
app = Flask(__name__)
# Flask's WSGI callable is app.wsgi_app
print(type(app.wsgi_app)) # <class 'method'>
print(callable(app)) # True — app itself is also callableWhen you run flask run or app.run(), Flask uses its built-in development server (based on Werkzeug, which is itself built on wsgiref). In production, you’d use Waitress or Gunicorn instead:
# Development (Flask's built-in server — single-threaded, debug mode)
flask run
# Production (Waitress — multithreaded, no debug)
waitress-serve --host 0.0.0.0 --port 8000 myapp:appThe application code stays exactly the same. Only the server changes. That’s WSGI doing its job.
Let’s zoom out and see the full picture:
┌─────────────────────────────────────────────────────┐
│ Client (browser, curl, requests, mobile app, ...) │
└──────────────────────┬──────────────────────────────┘
│ HTTP over TCP
┌──────────────────────▼──────────────────────────────┐
│ WSGI Server (Waitress, Gunicorn, uWSGI) │
│ • Accepts TCP connections │
│ • Parses HTTP requests → builds environ dict │
│ • Calls application(environ, start_response) │
│ • Sends HTTP response bytes back to client │
│ • Manages threads/workers for concurrency │
└──────────────────────┬──────────────────────────────┘
│ WSGI interface
┌──────────────────────▼──────────────────────────────┐
│ WSGI Application (Flask, Django, your raw function)│
│ • Reads environ to understand the request │
│ • Routes to the right handler │
│ • Runs business logic (DB queries, validation) │
│ • Returns response body as bytes │
└─────────────────────────────────────────────────────┘
This is the same layered system that REST’s fifth constraint describes. The client doesn’t know (or care) whether the server is Waitress or Gunicorn. The application doesn’t know (or care) which server is calling it. Each layer only talks to its neighbor through a defined interface.
One important limitation: WSGI is inherently synchronous. The server calls application(environ, start_response) and blocks until it returns. Each request ties up a thread (or process) for its entire duration. This is fine for most applications, but it becomes a problem when you have many slow clients or long-lived connections (like WebSockets).
This is the exact scalability limitation we flagged in Lecture 3 with the thread-per-client model. The solution—ASGI and async programming—is the topic of Lecture 5.
Putting It All Together: A Flask Bookstore API
Let’s build a small but complete API that demonstrates everything from this lecture: resource-oriented URLs, proper HTTP methods, meaningful status codes, and JSON request/response bodies. We’ll use Flask since you already know it from the Framework Python section—but now you understand what Flask is under the hood.
The Application
# bookstore.py
from flask import Flask, jsonify, request, abort
app = Flask(__name__)
# In-memory "database" — a dict keyed by book ID
books = {
1: {"id": 1, "title": "Dune", "author": "Frank Herbert", "year": 1965},
2: {"id": 2, "title": "Neuromancer", "author": "William Gibson", "year": 1984},
3: {"id": 3, "title": "Snow Crash", "author": "Neal Stephenson", "year": 1992},
}
next_id = 4
@app.route("/books", methods=["GET"])
def list_books():
"""GET /books — return all books, with optional filtering."""
genre = request.args.get("author") # query parameter
if genre:
filtered = [b for b in books.values() if b["author"] == genre]
return jsonify(filtered)
return jsonify(list(books.values()))
@app.route("/books/<int:book_id>", methods=["GET"])
def get_book(book_id):
"""GET /books/:id — return a single book."""
book = books.get(book_id)
if book is None:
abort(404)
return jsonify(book)
@app.route("/books", methods=["POST"])
def create_book():
"""POST /books — create a new book from JSON body."""
global next_id
data = request.get_json()
if not data or "title" not in data or "author" not in data:
return jsonify({"error": "Missing 'title' or 'author'"}), 400
book = {
"id": next_id,
"title": data["title"],
"author": data["author"],
"year": data.get("year"),
}
books[next_id] = book
next_id += 1
return jsonify(book), 201, {"Location": f"/books/{book['id']}"}
@app.route("/books/<int:book_id>", methods=["PUT"])
def replace_book(book_id):
"""PUT /books/:id — replace a book entirely."""
if book_id not in books:
abort(404)
data = request.get_json()
if not data or "title" not in data or "author" not in data:
return jsonify({"error": "Missing 'title' or 'author'"}), 400
book = {
"id": book_id,
"title": data["title"],
"author": data["author"],
"year": data.get("year"),
}
books[book_id] = book
return jsonify(book)
@app.route("/books/<int:book_id>", methods=["DELETE"])
def delete_book(book_id):
"""DELETE /books/:id — remove a book."""
if book_id not in books:
abort(404)
del books[book_id]
return "", 204
if __name__ == "__main__":
app.run(host="127.0.0.1", port=8000, debug=True)Notice how each endpoint maps to the REST patterns we discussed:
| Endpoint | Method | Status | Richardson Level 2 |
|---|---|---|---|
/books |
GET | 200 | List the collection |
/books/42 |
GET | 200 / 404 | Read a single resource |
/books |
POST | 201 + Location | Create a new resource |
/books/42 |
PUT | 200 / 404 | Replace a resource |
/books/42 |
DELETE | 204 / 404 | Delete a resource |
Running It
Start the Flask development server:
python bookstore.pyOr with Waitress for something closer to production:
pip install waitress
waitress-serve --host 127.0.0.1 --port 8000 bookstore:appTesting with curl
Open another CMD window and try these commands:
:: List all books
curl http://127.0.0.1:8000/books
:: Get a specific book
curl http://127.0.0.1:8000/books/1
:: Get a book that doesn't exist (expect 404)
curl -v http://127.0.0.1:8000/books/999
:: Create a new book
curl -X POST http://127.0.0.1:8000/books -H "Content-Type: application/json" -d "{\"title\": \"The Hitchhiker's Guide\", \"author\": \"Douglas Adams\", \"year\": 1979}"
:: Replace a book
curl -X PUT http://127.0.0.1:8000/books/1 -H "Content-Type: application/json" -d "{\"title\": \"Dune (Revised)\", \"author\": \"Frank Herbert\", \"year\": 1965}"
:: Delete a book
curl -X DELETE http://127.0.0.1:8000/books/3
:: Verify it's gone
curl http://127.0.0.1:8000/bookscurl Quick Reference
| Flag | Meaning |
|---|---|
-X POST |
Set the HTTP method (default is GET) |
-H "..." |
Add a header |
-d "..." |
Send a request body (implies POST if no -X) |
-v |
Verbose — show request and response headers |
-i |
Show response headers along with body |
Testing with Python’s requests
If you prefer Python over curl (and who wouldn’t?), the requests library is the standard tool for making HTTP requests:
pip install requestsimport requests
BASE = "http://127.0.0.1:8000"
# List all books
response = requests.get(f"{BASE}/books")
print(response.status_code) # 200
print(response.json()) # list of book dicts
# Get one book
response = requests.get(f"{BASE}/books/1")
print(response.json()) # {"id": 1, "title": "Dune", ...}
# Create a book
response = requests.post(
f"{BASE}/books",
json={"title": "Foundation", "author": "Isaac Asimov", "year": 1951},
)
print(response.status_code) # 201
print(response.headers["Location"]) # /books/4
print(response.json()) # the new book
# Delete a book
response = requests.delete(f"{BASE}/books/2")
print(response.status_code) # 204
# Try to get the deleted book
response = requests.get(f"{BASE}/books/2")
print(response.status_code) # 404Notice how requests mirrors the HTTP concepts we’ve discussed:
requests.get()/.post()/.put()/.delete()→ HTTP methods.json=...→ sets the body and theContent-Type: application/jsonheader automatically.response.status_code→ the status code.response.json()→ parses the JSON body.response.headers→ a dict-like object with response headers.
Under the hood, requests opens a TCP socket, sends a formatted HTTP request (exactly like we did by hand in Lecture 3), reads the response, and wraps it in a convenient Python object. No magic—just layers of abstraction.
Connecting It All Back
This small Flask app ties together everything from the lecture series:
- Lecture 1: Flask uses threads to handle concurrent requests (via Werkzeug’s threaded server, or Waitress’s thread pool).
- Lecture 3: Under the hood, it’s TCP sockets exchanging HTTP-formatted bytes.
- This lecture: The API follows REST conventions (Level 2), uses proper HTTP methods and status codes, and the WSGI interface lets us swap the server without changing the application.
In Lecture 5, we’ll see how FastAPI + Uvicorn achieves the same thing but with async/await, enabling much higher concurrency for IO-bound workloads.
Summary
We’ve covered a lot of conceptual ground. Here’s the cheat sheet:
| Concept | What It Is | Key Insight |
|---|---|---|
| Client-Server | One provides, many consume | Server owns the data; clients come and go |
| RPC | Call functions on remote machines | Convenient but hides network realities |
| REST | Architectural style (6 constraints) | Resources, representations, statelessness, HATEOAS |
| Richardson Model | Maturity levels 0–3 | Most APIs are Level 2; that’s usually fine |
| HTTP Methods | GET, POST, PUT, PATCH, DELETE | Map to CRUD; safety and idempotency matter |
| Status Codes | 2xx/3xx/4xx/5xx | Be specific: 201, 204, 404, 422 — not just 200 and 500 |
| WSGI | Server ↔︎ Application interface | application(environ, start_response) — decouples server from framework |
| Flask | A WSGI application | Routes + decorators + environ parsing, all behind a friendly API |
The story arc from Lectures 1–4:
Raw sockets (L3)
→ HTTP is just text over TCP (L3)
→ Client-server architecture (L4)
→ RPC: function calls over the network (L4)
→ REST: resources over the network (L4)
→ WSGI: decouple server from app (L4)
→ Flask/Django: friendly wrappers around WSGI (L4)
And the thread of concurrency:
Processes and threads (L1)
→ Thread-per-client servers (L3)
→ Threaded WSGI servers like Waitress (L4)
→ But threads don't scale to 10k connections...
→ Async programming and ASGI (L5)
Exercises & Project Ideas
Additional Resources
- Roy Fielding’s dissertation (Chapter 5 — REST) — the original source
- PEP 3333 — Python Web Server Gateway Interface v1.0.1 — the WSGI specification
- Richardson Maturity Model — Martin Fowler’s excellent overview
- A Note on Distributed Computing — the classic critique of hiding the network
- Flask documentation — the framework you know, now understood from the inside
- Waitress documentation — a production-ready WSGI server for Windows
- HTTP status codes (MDN) — the definitive reference
- httpbin.org — a handy service for testing HTTP clients
Next: Lecture 5 — Async Programming, Event Loops, and ASGI, where we tackle the thread-per-client scalability wall. We’ll learn async/await, understand event loops, and see how Uvicorn + FastAPI replace the WSGI stack with something that handles thousands of concurrent connections on a single thread.