import os
my_pid = os.getpid()
print(f"My PID is {my_pid}")Introduction to Processes and Threads
Workers, warehouses, and why your computer can do many things at once
Introduction
Before we can build anything resembling a REST API, we need to understand what’s happening underneath—how your operating system juggles multiple programs, and how Python can participate in that juggling act.
This is the first lecture in a five-part series:
- Processes and Threads (you are here)
- Multiprocessing and Multithreading in practice
- Interprocess Communication and Sockets
- Client-Server Architectures and RESTful APIs
- Async Programming, Event Loops, and ASGI
We’ll start at the very bottom—what is a process?—and work our way up to building real networked applications.
What Is a Process?
Open Task Manager on Windows (Ctrl+Shift+Escape). You’ll see a list of everything your computer is currently running: your browser, Spotify, VS Code, maybe a few dozen background services. Each of those is a process.
A process is, in simplified terms:
- A chunk of memory where the program’s code lives
- Another chunk of memory that the program is allowed to use
- A set of resources managed by the operating system (file handles, network connections, etc.)
Every process is isolated. If one process tries to touch memory it doesn’t own, the operating system kills it. This is the infamous segmentation fault. In Python, you’re protected from this unless you’re doing something exotic with C extensions.
Every process gets a unique PID (Process ID). You can find your current Python process’s PID with os.getpid():
But os only tells us so much. For the real detective work, we bring in psutil.
Exploring Processes with psutil
psutil (process and system utilities) is a third-party library that gives you access to a wealth of information about running processes. Install it with:
pip install psutilLet’s see what we can learn about our own process:
import psutil
import os
my_pid = os.getpid()
me = psutil.Process(my_pid)
print(f"PID: {me.pid}")
print(f"Name: {me.name()}")
print(f"Executable: {me.exe()}")
print(f"Cmdline: {me.cmdline()}")
print(f"Memory: {me.memory_info().rss / 1024 / 1024:.1f} MB")Run this code from different environments—a plain CMD terminal, VS Code’s integrated terminal, Jupyter Lab. Compare the output. You’ll notice the exe() and cmdline() differ depending on how Python was launched.
Parent–Child Relationships
Processes don’t just appear out of thin air. Every process is created by another process—its parent. You can walk up the family tree:
import psutil
import os
current = psutil.Process(os.getpid())
while current is not None:
print(f"PID {current.pid:>6} {current.name()}")
current = current.parent()On Windows, you’ll typically see something like:
PID 12345 python.exe
PID 6789 cmd.exe
PID 1234 explorer.exe
PID 0 System Idle Process
This tells a story: Explorer spawned CMD, CMD spawned Python. The operating system is a tree of processes, all the way down to the root.
What Else Can psutil Do?
Quite a lot. Here’s a sampler:
import psutil
me = psutil.Process()
# Files this process has open
for f in me.open_files():
print(f.path)
# Memory maps (which DLLs/shared libraries are loaded)
for m in me.memory_maps()[:5]:
print(m.path)You can also iterate over all processes on the system:
import psutil
# Top 5 processes by memory usage
procs = []
for p in psutil.process_iter(['pid', 'name', 'memory_percent']):
try:
procs.append(p.info)
except psutil.NoSuchProcess:
pass
top5 = sorted(procs, key=lambda x: x['memory_percent'], reverse=True)[:5]
print("Top 5 memory hogs:")
for p in top5:
print(f" PID {p['pid']:<6} {p['name']:<25} {p['memory_percent']:.1f}%")And yes, you can terminate() or kill() processes too—if you have the permissions. Handle with care.
Starting New Processes with subprocess
Now that we understand what processes are, let’s make some. The subprocess module lets you launch a new process from Python and interact with it.
The Basics
The workhorse is subprocess.run():
import subprocess
import sys
# Run a simple Python one-liner in a *separate* process
result = subprocess.run(
[sys.executable, "-c", "print('Hello from the child process!')"],
capture_output=True,
text=True,
)
print(f"Return code: {result.returncode}")
print(f"Output: {result.stdout.strip()}")A few things to note:
- We use
sys.executablefor the full path to the current Python interpreter. This avoids accidentally invoking a differentpython.exe(or worse, a malicious one on your PATH). capture_output=Truecaptures both stdout and stderr.text=Truedecodes the output as a string (otherwise you get raw bytes).- A return code of
0means success. Anything else signals trouble.
Handling Errors
What happens when the child process crashes?
import subprocess
import sys
result = subprocess.run(
[sys.executable, "-c", "1 / 0"],
capture_output=True,
text=True,
)
print(f"Return code: {result.returncode}")
print(f"Stderr:\n{result.stderr}")The parent process doesn’t crash—it just sees a nonzero return code and can read the traceback from stderr. This is one of the benefits of process isolation.
The Warehouse Analogy
Time for a mental model that will serve us throughout this series.
Processes = Separate Warehouses
Think of each process as a separate warehouse. Each warehouse has:
- Its own building (memory space)
- Its own tools and equipment (resources)
- Its own inventory (data)
- One or more workers inside
Workers in different warehouses can’t share tools directly. If warehouse A needs to send something to warehouse B, they have to use a delivery truck (interprocess communication—Lecture 3). Setting up a new warehouse is expensive: you need to construct a building, buy tools, hire workers.
Threads = Workers in the Same Warehouse
A thread is a worker inside a warehouse. Multiple workers (threads) share the same building, the same tools, and the same inventory. This makes communication trivially easy—they can just talk to each other, or grab the same tool from the shelf.
But there’s a catch: if two workers reach for the same tool at the same time, things go wrong. One might grab it while the other is mid-swing. This is a race condition, and it’s the central challenge of multithreaded programming.
The Tradeoff
| Processes (warehouses) | Threads (workers) | |
|---|---|---|
| Memory | Isolated (safe) | Shared (fast but risky) |
| Startup cost | High | Low |
| Communication | Requires explicit IPC | Direct (shared memory) |
| Crash isolation | One crash doesn’t affect others | One crash can take down all threads |
This analogy maps directly to how operating systems work, and we’ll keep coming back to it.
IO-Bound vs CPU-Bound
Not all work is the same. Understanding the type of work matters enormously for choosing the right concurrency strategy.
CPU-Bound Work
The worker is busy hammering all day. Number crunching, image processing, compression, machine learning training—the CPU is the bottleneck.
# CPU-bound: the processor is working hard
def crunch_numbers(n):
total = 0
for i in range(n):
total += i * i
return totalIO-Bound Work
The worker is standing around waiting for a delivery truck. Downloading a file, querying a database, waiting for user input—the CPU is mostly idle, waiting for something external.
import time
# IO-bound: simulated wait (in real life: network request, file read, etc.)
def wait_for_delivery(seconds):
time.sleep(seconds)
return "Package arrived!"Why This Matters
Here’s the punchline:
- IO-bound → Threads help. While one worker waits for a truck, another can do useful work. Multiple threads sharing one warehouse is perfect for this.
- CPU-bound → Processes help. You need more warehouses (= more CPU cores) to actually hammer faster. More workers in the same warehouse doesn’t help if there’s only one hammer.
Python (CPython, specifically) has the Global Interpreter Lock (GIL). In warehouse terms: only one worker per warehouse can swing the hammer at any given moment. Waiting around (IO) doesn’t count as hammering, so threads still help for IO-bound work. But for CPU-bound work, threads in Python are essentially useless—you need multiple processes.
Since Python 3.13, there’s an experimental free-threaded build (PEP 703) that removes the GIL entirely. This is still experimental in 3.14, but it’s the future. For now, the default build still has the GIL.
The multiprocessing Library
We’ve used subprocess to launch arbitrary commands. But when we specifically want to run Python functions in parallel across multiple processes, the multiprocessing module is the tool for the job.
multiprocessing.Process
The fundamental building block. Create a process, give it a function to run, start it, and wait for it:
from multiprocessing import Process
import os
def worker(name):
print(f"Worker '{name}' reporting from PID {os.getpid()}")
if __name__ == '__main__':
p = Process(target=worker, args=("Alice",))
p.start() # Launch the new process
p.join() # Wait for it to finish
print(f"Main process (PID {os.getpid()}) done.")if __name__ == '__main__': Guard
On Windows, multiprocessing uses the spawn method to create new processes. This means it starts a fresh Python interpreter and imports your script from scratch. Without the guard, the child process would try to spawn another child, which would spawn another, and so on. Infinite recursion. Always use the guard.
(On Linux/macOS, the default used to be fork, which copies the parent process. But Python 3.14 will switch the default to spawn everywhere. The guard is good practice regardless.)
You can spawn multiple processes and run them concurrently:
from multiprocessing import Process
import os
import time
def compute_sum(label, start, end):
t0 = time.perf_counter()
total = sum(range(start, end))
dt = time.perf_counter() - t0
print(f"[PID {os.getpid()}] {label}: sum({start}..{end}) = {total} ({dt:.3f}s)")
if __name__ == '__main__':
t0 = time.perf_counter()
p1 = Process(target=compute_sum, args=("A", 0, 5_000_000))
p2 = Process(target=compute_sum, args=("B", 5_000_000, 10_000_000))
p1.start()
p2.start()
p1.join()
p2.join()
print(f"Total wall time: {time.perf_counter() - t0:.3f}s")multiprocessing.Pool
For the common pattern of “apply this function to each item in a list, using N worker processes,” Pool is the high-level convenience:
from multiprocessing import Pool
import os
import time
def heavy_task(i):
"""Simulate a CPU-bound task that takes ~1 second."""
t0 = time.perf_counter()
total = sum(range(1_000_000)) # busywork
dt = time.perf_counter() - t0
print(f" Task {i} (PID {os.getpid()}) took {dt:.3f}s")
return total
if __name__ == '__main__':
t0 = time.perf_counter()
with Pool(3) as pool: # 3 worker processes
results = pool.map(heavy_task, range(6)) # 6 tasks
print(f"\nTotal wall time: {time.perf_counter() - t0:.3f}s")
print(f"Total results: {sum(results)}")With 6 tasks and 3 workers, you should see roughly 2× speedup: the first 3 tasks run in parallel, then the next 3 fill the freed-up workers.
Pool() without arguments defaults to os.cpu_count(), which is usually the number of logical cores. For CPU-bound work, that’s a reasonable default. For IO-bound work, you might want more workers than cores.
Synchronization Primitives
Lock: One at a Time
A Lock is the simplest synchronization primitive. It’s a key to a room—only one process can hold the key at a time. Everyone else waits at the door.
from multiprocessing import Process, Lock
import time
def polite_worker(name, lines, lock):
"""A worker that acquires the lock before printing."""
for i in range(lines):
time.sleep(0.01) # some work
with lock: # acquire lock, print, release lock
print(f"[{name}] Line {i}: My important results!")
if __name__ == '__main__':
lock = Lock()
workers = [
Process(target=polite_worker, args=(f"Worker-{i}", 5, lock))
for i in range(4)
]
for w in workers:
w.start()
for w in workers:
w.join()Now the output is clean: each worker’s print statement completes fully before another worker can start printing. The with lock: block is a context manager that acquires the lock on entry and releases it on exit—even if an exception occurs.
As we saw in the Architecture series, context managers (with statements) are Python’s way of ensuring cleanup happens. Lock supports this protocol: with lock: calls lock.acquire() at the start and lock.release() at the end. Always prefer the with syntax over manual acquire/release.
A Quick Tour of Other Primitives
Locks are just the beginning. The multiprocessing module (and threading) provides several more synchronization tools:
- Event
- A simple flag. One process sets it, others wait for it. Useful for signaling: “the data is ready” or “time to shut down.”
- Semaphore
- Like a lock, but allows up to N holders simultaneously. Think of it as a room with N keys: up to N processes can enter, the rest wait. Useful when you want to limit concurrent access to a resource (e.g., max 5 simultaneous database connections).
- Queue
- A thread/process-safe mailbox. One or more producers put items in; one or more consumers take them out. This is the workhorse of inter-process data passing, and we’ll use it extensively in Lecture 2 to send results from worker processes back to the main GUI thread.
- Barrier
- Forces N processes to wait until all of them have reached the same point, then lets them all proceed. Useful for synchronized phases of computation.
We won’t deep-dive into all of these now—Lock and Queue are the two you’ll use most often. But it’s good to know the toolkit exists.
Summary
Let’s recap with our warehouse analogy:
| Concept | Analogy | Python |
|---|---|---|
| Process | A separate warehouse | multiprocessing.Process, subprocess |
| Thread | A worker inside a warehouse | threading.Thread (next lecture) |
| PID | The warehouse’s address | os.getpid(), psutil.Process |
| CPU-bound | Workers hammering all day | Use processes for parallelism |
| IO-bound | Workers waiting for deliveries | Use threads (or async—Lecture 5) |
| Lock | A key to a room—one at a time | multiprocessing.Lock |
| Queue | A mailbox between warehouses | multiprocessing.Queue |
Key takeaways:
- A process is an isolated running program with its own memory space.
- psutil lets you explore the process tree from Python.
- subprocess lets you launch new programs; multiprocessing lets you run Python functions in parallel.
- The GIL means threads don’t help for CPU-bound Python code—use processes instead.
- Synchronization primitives (locks, semaphores, queues) prevent chaos when multiple processes or threads share resources.
Exercises & Project Ideas
Here’s a consolidated list of exercises and a project idea:
Additional Resources
- psutil documentation
- subprocess documentation
- multiprocessing documentation
- PEP 703 — Making the GIL Optional
- Python 3.13 What’s New — Free-threaded CPython
Next: Lecture 2 — Multiprocessing and Multithreading, where we’ll combine threads and processes in a live TKinter application that approximates π with real-time visual feedback.