Multiprocessing and Multithreading in Practice

Threads, queues, and a live π estimator

Author

Karsten Naert

Published

February 9, 2026

Introduction

In Lecture 1 we laid the groundwork: processes are isolated warehouses, threads are workers inside those warehouses, and the GIL means Python threads can’t hammer simultaneously. We explored psutil, subprocess, and multiprocessing.Process/Pool.

This lecture puts all of that into practice. We’ll:

  1. Learn the threading module hands-on.
  2. Understand how threads and processes communicate via queues.
  3. Build a complete TKinter application that uses a background thread to coordinate worker processes—all to approximate π in real time.

This is the second lecture in a five-part series:

  1. Processes and Threads
  2. Multiprocessing and Multithreading in Practice (you are here)
  3. Interprocess Communication and Sockets
  4. Client-Server Architectures and RESTful APIs
  5. Async Programming, Event Loops, and ASGI

The threading Module

The API mirrors multiprocessing.Process almost exactly. If you can spawn a process, you can spawn a thread.

Your First Thread

import threading
import time

def worker(name, duration):
    print(f"[{name}] Starting (thread {threading.current_thread().name})")
    time.sleep(duration)
    print(f"[{name}] Done after {duration}s")

t1 = threading.Thread(target=worker, args=("Alice", 2))
t2 = threading.Thread(target=worker, args=("Bob", 1))

t1.start()
t2.start()

t1.join()
t2.join()
print("Both threads finished.")

This looks almost identical to multiprocessing.Process—and that’s intentional. Python’s concurrency APIs are designed to be swappable. The critical difference: these threads run inside the same process, sharing the same memory space.

Run this a few times. You’ll notice Bob finishes before Alice despite starting second. The OS decides when each thread gets CPU time—this is preemptive scheduling. Your code has no say in the order.

Threads Share Memory

In Lecture 1 we said threads are workers in the same warehouse. They can grab the same tools from the same shelf. Let’s see what that means concretely.

import threading

counter = 0

def increment(n):
    global counter
    for _ in range(n):
        counter += 1

t1 = threading.Thread(target=increment, args=(1_000_000,))
t2 = threading.Thread(target=increment, args=(1_000_000,))

t1.start()
t2.start()
t1.join()
t2.join()

print(f"Expected: 2000000, Got: {counter}")

Run this several times. Sometimes you get 2000000. Sometimes you don’t. This is a race condition: two workers reaching for the same tool simultaneously. The operation counter += 1 is not atomic—it’s a read, an increment, and a write. The OS can interrupt a thread between any of those steps.

The fix is the same as in Lecture 1: a Lock.

import threading

counter = 0
lock = threading.Lock()

def increment_safe(n):
    global counter
    for _ in range(n):
        with lock:
            counter += 1

t1 = threading.Thread(target=increment_safe, args=(1_000_000,))
t2 = threading.Thread(target=increment_safe, args=(1_000_000,))

t1.start()
t2.start()
t1.join()
t2.join()

print(f"Expected: 2000000, Got: {counter}")

Now it’s always correct—but slower, because threads take turns. This is the fundamental tradeoff of shared-memory concurrency: correctness versus performance.

threading.Lock vs multiprocessing.Lock

Same concept, different scope. threading.Lock synchronizes threads within one process. multiprocessing.Lock synchronizes across processes. The with lock: context-manager syntax works for both (see the Architecture series on context managers).

The GIL in Action: A Benchmark

Let’s make this concrete. We’ll time the Monte Carlo π estimation from Lecture 1 using three approaches: sequential, threaded, and multiprocessed.

import random
import time

def monte_carlo_pi(num_points):
    inside = 0
    for _ in range(num_points):
        x, y = random.random(), random.random()
        if x * x + y * y <= 1.0:
            inside += 1
    return inside

Sequential:

N = 4_000_000

t0 = time.perf_counter()
total_inside = sum(monte_carlo_pi(1_000_000) for _ in range(4))
pi_est = 4 * total_inside / N
print(f"Sequential: π ≈ {pi_est:.6f}  ({time.perf_counter() - t0:.2f}s)")

Threaded (4 threads):

import threading

results = [0] * 4

def threaded_worker(index, n):
    results[index] = monte_carlo_pi(n)

t0 = time.perf_counter()
threads = [threading.Thread(target=threaded_worker, args=(i, 1_000_000)) for i in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()

pi_est = 4 * sum(results) / N
print(f"Threaded:   π ≈ {pi_est:.6f}  ({time.perf_counter() - t0:.2f}s)")

Multiprocessed (4 processes):

from multiprocessing import Pool

if __name__ == '__main__':
    t0 = time.perf_counter()
    with Pool(4) as pool:
        results = pool.map(monte_carlo_pi, [1_000_000] * 4)
    pi_est = 4 * sum(results) / N
    print(f"Multiproc:  π ≈ {pi_est:.6f}  ({time.perf_counter() - t0:.2f}s)")

On a typical 4-core machine you’ll see something like:

Sequential: π ≈ 3.141247  (3.12s)
Threaded:   π ≈ 3.141528  (3.15s)    ← no speedup!
Multiproc:  π ≈ 3.141692  (1.05s)    ← ~3× speedup

Threads didn’t help at all. The GIL ensures only one thread executes Python bytecode at a time, so four threads doing CPU-bound work are effectively sequential. Processes bypass the GIL entirely—each has its own interpreter.

The Rule of Thumb
  • CPU-bound → use multiprocessing (or wait for the free-threaded build to mature)
  • IO-bound (network, file, user input) → use threading (or asyncioLecture 5)
  • Both → combine them, which is exactly what we’ll do in the capstone project

Daemon Threads

By default, Python waits for all threads to finish before exiting. A daemon thread is a background worker that gets killed automatically when the main thread exits.

import threading
import time

def background_task():
    while True:
        print("Still working...")
        time.sleep(1)

t = threading.Thread(target=background_task, daemon=True)
t.start()
time.sleep(3)
print("Main thread done — daemon dies with us.")

This is useful for monitoring or heartbeat tasks. For our capstone, we’ll use a regular (non-daemon) thread and control its lifecycle with a threading.Event.

import threading
import time

stop_event = threading.Event()

def polite_background():
    while not stop_event.is_set():
        print("Working...")
        time.sleep(1)
    print("Received stop signal. Shutting down.")

t = threading.Thread(target=polite_background)
t.start()

time.sleep(3)
stop_event.set()
t.join()
print("Clean shutdown complete.")

threading.Event is a simple flag: one thread sets it, others check it. Much cleaner than killing a thread mid-work.

Queues: Thread-Safe and Process-Safe Mailboxes

We know threads share memory and processes don’t. But even with shared memory, direct access leads to race conditions. The clean solution is a queue: a thread-safe (or process-safe) FIFO pipe where one side puts data in and the other takes data out.

queue.Queue — For Threads

The queue module from the standard library provides Queue, a thread-safe mailbox. The classic use case is the producer–consumer pattern.

import queue
import threading
import time

def producer(q, n):
    for i in range(n):
        time.sleep(0.1)
        q.put(f"item-{i}")
    q.put(None)

def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consumed: {item}")

q = queue.Queue()

t_prod = threading.Thread(target=producer, args=(q, 5))
t_cons = threading.Thread(target=consumer, args=(q,))

t_prod.start()
t_cons.start()
t_prod.join()
t_cons.join()

The None sentinel signals the consumer to stop. q.get() blocks until an item is available—no busy waiting, no race conditions. This is the bread and butter of concurrent Python.

A few useful methods:

  • q.put(item) — add an item (blocks if the queue is full, when a maxsize is set)
  • q.get() — remove and return an item (blocks until one is available)
  • q.get_nowait() — like get() but raises queue.Empty instead of blocking
  • q.qsize() — approximate size (don’t rely on this for synchronization)

multiprocessing.Queue — For Processes

Same idea, but data crosses process boundaries. Under the hood, Python serializes (pickles) the data, sends it through a pipe, and deserializes it on the other side.

from multiprocessing import Process, Queue
import os

def worker(q, n):
    total = sum(range(n))
    q.put((os.getpid(), total))

if __name__ == '__main__':
    q = Queue()
    processes = [Process(target=worker, args=(q, 10_000_000)) for _ in range(4)]

    for p in processes:
        p.start()

    for _ in range(4):
        pid, result = q.get()
        print(f"PID {pid}: {result}")

    for p in processes:
        p.join()
Pickling Requirement

Everything you put on a multiprocessing.Queue must be picklable. That means basic types, most standard library objects, and your own classes (as long as they’re defined at module level). Lambda functions, open file handles, and database connections are not picklable.

Why Two Different Queues?

queue.Queue multiprocessing.Queue
Scope Threads within one process Across processes
Speed Fast (shared memory) Slower (serialization + IPC)
Data Any Python object Must be picklable
Use case Thread coordination Process coordination

In our capstone project, we’ll use both: a multiprocessing.Queue for worker processes to send results back, and a queue.Queue for the coordinator thread to feed updates to the UI thread.

TKinter: A Minimal GUI Toolkit

TKinter ships with Python—no pip install needed. We won’t do a deep dive here; we just need enough to build our capstone.

A Minimal Window

import tkinter as tk

root = tk.Tk()
root.title("Hello TKinter")

label = tk.Label(root, text="Nothing happening yet.", font=("Consolas", 16))
label.pack(padx=20, pady=20)

button = tk.Button(root, text="Click me", command=lambda: label.config(text="Clicked!"))
button.pack(pady=10)

root.mainloop()

Save this as hello_tk.py and run it from CMD:

python hello_tk.py

A window appears. Click the button. The label changes. Close the window to exit.

The key line is root.mainloop(). This hands control to TKinter’s event loop: it listens for user actions (clicks, key presses, window resizes) and dispatches them to your callbacks. The main thread is now occupied running this loop. If your callback takes a long time, the event loop can’t process other events—the UI freezes.

The Blocking Problem

Let’s simulate a long computation in a button callback:

import tkinter as tk
import time

def slow_task():
    label.config(text="Computing...")
    time.sleep(5)
    label.config(text="Done!")

root = tk.Tk()
root.title("Blocking Demo")

label = tk.Label(root, text="Ready.", font=("Consolas", 16))
label.pack(padx=20, pady=20)

button = tk.Button(root, text="Start slow task", command=slow_task)
button.pack(pady=10)

root.mainloop()

Run this and click the button. The window becomes unresponsive for 5 seconds—you can’t move it, resize it, or close it. The “Computing…” text might not even appear until after the sleep, because TKinter hasn’t had a chance to repaint.

This is why long-running work must happen off the main thread.

Updating the UI from Another Thread

TKinter is not thread-safe. You cannot call widget methods (like label.config(...)) directly from a background thread—it will work sometimes and crash unpredictably other times.

The safe pattern is:

  1. Background thread puts results on a queue.Queue.
  2. Main thread periodically polls that queue using root.after().

root.after(ms, callback) schedules a function to run on the main thread after ms milliseconds. It’s TKinter’s equivalent of “check your mailbox every 100ms.”

import tkinter as tk
import threading
import queue
import time

def background_counter(q, stop_event):
    i = 0
    while not stop_event.is_set():
        i += 1
        q.put(i)
        time.sleep(0.5)

def poll_queue():
    while not ui_queue.empty():
        try:
            value = ui_queue.get_nowait()
            label.config(text=f"Count: {value}")
        except queue.Empty:
            break
    root.after(100, poll_queue)

root = tk.Tk()
root.title("Threaded Counter")

label = tk.Label(root, text="Count: 0", font=("Consolas", 16))
label.pack(padx=20, pady=20)

ui_queue = queue.Queue()
stop_event = threading.Event()

worker = threading.Thread(target=background_counter, args=(ui_queue, stop_event), daemon=True)
worker.start()

root.after(100, poll_queue)
root.mainloop()

stop_event.set()

The background thread counts and puts values on the queue every 500ms. The main thread checks the queue every 100ms via poll_queue and updates the label. The UI stays responsive throughout.

The root.after() Pattern

This is the standard way to bridge threads and TKinter. You’ll see it in virtually every non-trivial TKinter application. The idea generalizes: any framework with an event loop (TKinter, Qt, GTK, even web frameworks) needs a mechanism to safely inject work into that loop from the outside. In Lecture 5, we’ll see the async equivalent.

This pattern is exactly what we need for our capstone. Let’s build it.

Capstone: Live π Estimation with TKinter

We’re going to combine everything from this lecture and Lecture 1 into a single application:

  • Worker processes (CPU-bound) generate random points and count how many fall inside the unit circle.
  • A coordinator thread spawns those processes, collects results via multiprocessing.Queue, and forwards running totals to the UI via queue.Queue.
  • The main thread runs TKinter’s event loop, polling the UI queue with root.after() to display the live π estimate.

Architecture

┌─────────────────────────────────────────────────────┐
│  Main Process                                       │
│                                                     │
│  ┌──────────────────┐    queue.Queue    ┌────────┐  │
│  │ Coordinator      │ ───────────────►  │ Main   │  │
│  │ Thread           │                   │ Thread │  │
│  │                  │                   │ (Tk)   │  │
│  └──────┬───────────┘                   └────────┘  │
│         │                                           │
│         │ multiprocessing.Queue                     │
│         │                                           │
│  ┌──────┴───────────────────────────────────┐       │
│  │  Worker Processes (separate PIDs)        │       │
│  │  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐    │       │
│  │  │ W1   │ │ W2   │ │ W3   │ │ W4   │    │       │
│  │  └──────┘ └──────┘ └──────┘ └──────┘    │       │
│  └──────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────┘

Why this three-layer design?

  • The worker processes bypass the GIL, giving us real parallelism for CPU-bound work.
  • The coordinator thread bridges the process world and the thread world. It blocks on multiprocessing.Queue.get()—an IO-bound wait that the GIL happily releases.
  • The main thread never blocks, keeping the UI responsive.

The Complete Application

Below is the full script. Save it as pi_estimator.py and run it from CMD with python pi_estimator.py. We’ll walk through each piece afterward.

import tkinter as tk
import threading
import queue
import random
import os
import time
from multiprocessing import Process, Queue as MPQueue

BATCH_SIZE = 100_000
NUM_WORKERS = 4


def pi_worker(result_queue, stop_event_flag):
    pid = os.getpid()
    while not stop_event_flag.is_set():
        inside = 0
        for _ in range(BATCH_SIZE):
            x, y = random.random(), random.random()
            if x * x + y * y <= 1.0:
                inside += 1
        result_queue.put((pid, inside, BATCH_SIZE))


def coordinator(mp_queue, ui_queue, stop_event, mp_stop_flag):
    workers = []
    for _ in range(NUM_WORKERS):
        p = Process(target=pi_worker, args=(mp_queue, mp_stop_flag))
        p.start()
        workers.append(p)

    total_inside = 0
    total_points = 0

    while not stop_event.is_set():
        try:
            pid, inside, count = mp_queue.get(timeout=0.2)
            total_inside += inside
            total_points += count
            pi_est = 4 * total_inside / total_points
            ui_queue.put((pi_est, total_points, len(workers)))
        except Exception:
            pass

    mp_stop_flag.set()
    for p in workers:
        p.join(timeout=3)
        if p.is_alive():
            p.terminate()


class PiEstimatorApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Live π Estimator")
        self.root.resizable(False, False)

        self.pi_label = tk.Label(root, text="π ≈ ???", font=("Consolas", 28))
        self.pi_label.pack(padx=30, pady=(20, 5))

        self.info_label = tk.Label(root, text="Points: 0 | Workers: 0",
                                   font=("Consolas", 12))
        self.info_label.pack(padx=30, pady=(0, 10))

        self.status_label = tk.Label(root, text="Status: Idle",
                                     font=("Consolas", 10), fg="gray")
        self.status_label.pack(padx=30, pady=(0, 5))

        self.button = tk.Button(root, text="Start", font=("Consolas", 14),
                                command=self.toggle, width=12)
        self.button.pack(pady=(5, 20))

        self.ui_queue = queue.Queue()
        self.stop_event = threading.Event()
        self.mp_stop_flag = None
        self.coord_thread = None
        self.running = False

        self.root.protocol("WM_DELETE_WINDOW", self.on_close)

    def toggle(self):
        if self.running:
            self.stop()
        else:
            self.start()

    def start(self):
        self.running = True
        self.button.config(text="Stop")
        self.status_label.config(text="Status: Running", fg="green")
        self.stop_event.clear()

        mp_queue = MPQueue()
        self.mp_stop_flag = MPEvent()
        self.coord_thread = threading.Thread(
            target=coordinator,
            args=(mp_queue, self.ui_queue, self.stop_event, self.mp_stop_flag),
            daemon=True,
        )
        self.coord_thread.start()
        self.poll_queue()

    def stop(self):
        self.running = False
        self.stop_event.set()
        self.button.config(text="Start")
        self.status_label.config(text="Status: Idle", fg="gray")

    def poll_queue(self):
        while not self.ui_queue.empty():
            try:
                pi_est, total_points, num_workers = self.ui_queue.get_nowait()
                self.pi_label.config(text=f"π ≈ {pi_est:.8f}")
                self.info_label.config(
                    text=f"Points: {total_points:,} | Workers: {num_workers}")
            except queue.Empty:
                break
        if self.running:
            self.root.after(100, self.poll_queue)

    def on_close(self):
        self.stop_event.set()
        if self.mp_stop_flag:
            self.mp_stop_flag.set()
        self.root.destroy()


if __name__ == "__main__":
    from multiprocessing import Event as MPEvent

    root = tk.Tk()
    app = PiEstimatorApp(root)
    root.mainloop()
The if __name__ == "__main__": Guard

On Windows, multiprocessing uses spawn to create child processes—it starts a fresh Python interpreter and re-imports your script. Without the guard, the child would try to create a TKinter window, which would try to spawn more workers, and so on. The guard ensures only the original process builds the GUI.

The from multiprocessing import Event as MPEvent import is placed inside the guard for the same reason: we want child processes to import only the worker function, not the GUI machinery.

Walking Through the Code

pi_worker is the function each child process runs. It loops continuously, generating BATCH_SIZE random points per iteration, counting how many land inside the unit circle, and putting the result tuple (pid, inside, count) onto the multiprocessing.Queue. It checks stop_event_flag (a multiprocessing.Event) each iteration to know when to stop.

coordinator runs in a background thread. It spawns NUM_WORKERS child processes, then enters its own loop: read a result from mp_queue, update running totals, compute the current π estimate, and push it onto ui_queue (a queue.Queue). The timeout=0.2 on mp_queue.get() ensures we don’t block forever—if no result arrives within 200ms, we loop back and check stop_event. On shutdown, it signals the workers via mp_stop_flag and joins them, with a terminate() fallback for stubborn processes.

PiEstimatorApp is the GUI. The constructor builds the window, creates the queues and events, and registers on_close to handle the window’s X button cleanly.

  • start() clears the stop event, creates fresh queues, launches the coordinator thread, and kicks off poll_queue.
  • stop() sets the stop event, which propagates through the coordinator to the workers.
  • poll_queue() drains the UI queue and updates labels. It reschedules itself with root.after(100, self.poll_queue) as long as the app is running—this is the heartbeat that keeps the display updating.
  • on_close() ensures everything shuts down when the user closes the window.

Running It

Save the complete script as pi_estimator.py and run from CMD:

python pi_estimator.py

You should see a window with a large “π ≈ ???” label. Click Start. The estimate begins updating live, converging toward 3.14159265… as millions of points are sampled. Click Stop to pause, Start again to resume (with fresh workers). Close the window to exit cleanly.

Experiment

Try changing NUM_WORKERS and BATCH_SIZE at the top of the script. More workers = faster convergence (up to your CPU core count). Larger batches = less queue overhead but less frequent UI updates. Finding the sweet spot is part of the fun.

Summary

We’ve covered a lot of ground. Here’s the cheat sheet:

Concept What It Does Python
Thread Worker inside a process (shared memory) threading.Thread
Race condition Two threads touching the same data Fix with threading.Lock
GIL Only one thread runs Python at a time Bypass with multiprocessing
queue.Queue Thread-safe mailbox queue.Queue
multiprocessing.Queue Process-safe mailbox (pickled) multiprocessing.Queue
threading.Event Simple stop/go flag for threads threading.Event
root.after() Schedule work on TKinter’s main thread Bridge between threads and UI

The capstone demonstrated the architecture pattern that shows up everywhere in real software:

  • UI thread stays responsive (never blocks).
  • Coordinator thread manages workers and bridges communication.
  • Worker processes do the heavy lifting in parallel.

This same pattern—an event loop polling for results from background workers—is the foundation of client-server architectures. In Lecture 3, we’ll take communication to the next level: instead of queues within one machine, we’ll send messages over the network between separate processes. And in Lecture 5, we’ll see how async/await replaces threading for IO-bound server work.

Exercises & Project Ideas

Exercise 1: Thread vs Process π Benchmark

Take the three-way benchmark from this lecture (sequential, threaded, multiprocessed) and extend it:

  1. Vary the number of workers from 1 to os.cpu_count() * 2.
  2. Record the wall-clock time for each configuration.
  3. Plot the results (time vs number of workers) for both threading and multiprocessing.

What happens to multiprocessing performance beyond os.cpu_count()? Why?

Exercise 2: Resilient Coordinator

Modify the capstone’s coordinator function so that if a worker process crashes (e.g., simulate this by having it randomly raise an exception), the coordinator detects the dead process and spawns a replacement.

Hints:

  • Process.is_alive() tells you if a process is still running.
  • You can check periodically (e.g., every 10 queue reads).
  • Don’t forget to start the replacement and add it to the workers list.
Exercise 3: Download Manager

Build a TKinter application that downloads multiple files simultaneously using threads (not processes—downloading is IO-bound). Requirements:

  • A text field where the user can paste URLs (one per line).
  • A “Download All” button.
  • A progress display showing each URL and its status (waiting / downloading / done / error).
  • Downloads happen in background threads; the UI stays responsive.

Use urllib.request.urlretrieve or the requests library for the actual downloads.

Exercise 4: Producer-Consumer Pipeline

Build a three-stage pipeline using queue.Queue:

  1. Producer thread: reads lines from a large text file and puts them on queue A.
  2. Transformer thread: reads from queue A, converts each line to uppercase, and puts the result on queue B.
  3. Consumer thread: reads from queue B and writes the results to an output file.

Use None sentinels to signal shutdown through the pipeline. Measure the throughput (lines per second).

Additional Resources


Next: Lecture 3 — Interprocess Communication and Sockets, where we move beyond queues and learn how separate processes communicate over the network.