import tokenize
from io import StringIO
text = "print('Hello World')"
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(tok.string)print
(
'Hello World'
)
How it’s made: Python
Karsten Naert
November 15, 2025
Python is a programming language specification. The most common implementation is CPython, which comes with the familiar python.exe executable on Windows.
When you run python my_program.py, CPython processes your code through four distinct stages:
What makes Python interesting is that all of these stages are accessible to you through built-in modules. We’ll explore each stage hands-on.
The internals we discuss are specific to CPython 3.13/3.14. Other Python implementations (PyPy, GraalPy) work differently under the hood. Even between CPython versions, bytecode and internal representations can change significantly.
The first step is breaking your source code into tokens - sequences of characters that have meaning in Python. Think of it like breaking a sentence into words, except for code.
import tokenize
from io import StringIO
text = "print('Hello World')"
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(tok.string)print
(
'Hello World'
)
Our simple print statement consists of 7 tokens. Let’s make this clearer:
text = "print('Hello World')"
s = StringIO(text)
'|'.join(tok.string for tok in tokenize.generate_tokens(s.readline))"print|(|'Hello World'|)||"
Notice that 'Hello World' (including quotes) is treated as a single token. Each token also has a type:
text = "abc + xyz"
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(f"{tok.type:2d} {tokenize.tok_name[tok.type]:10s} {tok.string!r}") 1 NAME 'abc'
55 OP '+'
1 NAME 'xyz'
4 NEWLINE ''
0 ENDMARKER ''
Here’s a more complex example showing how Python handles function definitions:
text = """
def f(x):
return 2 * x
"""
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(f"{tokenize.tok_name[tok.type]:10s} {tok.string!r}")NL '\n'
NAME 'def'
NAME 'f'
OP '('
NAME 'x'
OP ')'
OP ':'
NEWLINE '\n'
INDENT ' '
NAME 'return'
NUMBER '2'
OP '*'
NAME 'x'
NEWLINE '\n'
DEDENT ''
ENDMARKER ''
Notice the INDENT and DEDENT tokens - Python’s whitespace sensitivity is baked in at the tokenization level.
Tokenization is useful beyond Python’s internal workings:
def find_names(code):
"""Find all variable names in Python code"""
s = StringIO(code)
names = {tok.string for tok in tokenize.generate_tokens(s.readline)
if tok.type == tokenize.NAME and not tok.string in ['def', 'return']}
return names
code = """
def calculate(x, y):
result = x + y
return result
"""
print(find_names(code)){'y', 'result', 'x', 'calculate'}
Large Language Models also use “tokens”, but they’re different. Python tokenization splits code based on syntax rules. LLM tokenization (like GPT’s) splits text into subword units based on frequency. Similar name, completely different purpose.
Tokens tell us what pieces we have. The AST tells us how they fit together.
Module(
body=[
Expr(
value=BinOp(
left=Constant(value=3),
op=Add(),
right=Constant(value=5)))])
That’s a lot of structure for 3 + 5! Let’s break it down:
Module (every Python file is a module)Expr (expression statement)BinOp (binary operation)Here’s a more interesting example:
Module(
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=BinOp(
left=Constant(value=5),
op=Add(),
right=Constant(value=6)))])
The comment disappeared - the AST only captures code structure, not formatting or comments.
You can go from AST back to source code (since Python 3.9):
But you won’t get your original code back - just equivalent code. The comment is gone, and whitespace may differ.
Even simple-looking code can have complex ASTs:
Module(
body=[
Assign(
targets=[
Tuple(
elts=[
Name(id='x', ctx=Store()),
Starred(
value=Name(id='stuff', ctx=Store()),
ctx=Store()),
Name(id='y', ctx=Store())],
ctx=Store())],
value=Name(id='L', ctx=Load()))])
The AST reveals how Python interprets the unpacking: x and y are regular targets, stuff is a Starred target that collects the rest.
The AST can be modified programmatically. Python provides ast.NodeTransformer for this:
class AddToMul(ast.NodeTransformer):
"""Convert all additions to multiplications"""
def visit_BinOp(self, node):
if isinstance(node.op, ast.Add):
node.op = ast.Mult()
return node
code = "x = 3 + 4 + 5"
tree = ast.parse(code)
transformed = AddToMul().visit(tree)
print("Original:", ast.unparse(tree))
print("Modified:", ast.unparse(transformed))Original: x = (3 + 4) * 5
Modified: x = (3 + 4) * 5
This is how tools like code formatters and refactoring tools work - they parse, transform, and regenerate code.
What we’ve just seen is an example of the Visitor pattern - a design pattern where different classes work together through a carefully orchestrated structure. The NodeTransformer is the visitor that “visits” each node in the AST tree, and the visit_BinOp method defines what happens when we encounter a binary operation node.
This pattern allows us to add new operations on the AST without modifying the AST node classes themselves. We’ll explore design patterns in much greater detail later in the course, but it’s important to recognize them when we encounter them in real-world code.
The AST is “abstract” - it discards formatting details like comments, whitespace, and exact syntax choices. For tools that need to preserve these (like code formatters), there’s an alternative: the Concrete Syntax Tree.
The external library libCST provides this. With libCST, you can parse a file and write it back byte-for-byte identical, then make targeted changes while preserving formatting.
Project idea: Consider building a code modernization tool that automatically updates deprecated API calls across a large codebase. For example, you might want to:
os.path.join() calls with pathlib.Path operations%s) to f-stringsunittest assertions to pytest styleUsing CST would preserve all comments, docstrings, and code style choices while making these targeted transformations. This is particularly valuable when working on legacy codebases where maintaining existing formatting and documentation is crucial. The tool could scan a project, identify patterns to modernize, and apply transformations while keeping the code’s original structure and style intact - something that would be impossible with AST alone since it discards all formatting information.
The AST still isn’t executable. Python compiles it into bytecode - a low-level instruction set for the Python virtual machine.
The compile() function takes three arguments:
<example> for inline code)'eval' for expressions, 'exec' for statementsThe result is a code object containing bytecode:
These bytes are what Python executes. Let’s make them readable with the dis module:
0 RESUME 0
1 LOAD_NAME 0 (print)
PUSH_NULL
LOAD_CONST 0 (7)
CALL 1
RETURN_VALUE
Each line is an instruction:
RESUME - Checkpoints for debugging/tracingLOAD_NAME - Load a variable (here, print)LOAD_CONST - Load a constant (here, 7, the result of 3 + 4)CALL - Call a functionRETURN_VALUE - Return the resultNotice that 3 + 4 was pre-computed! Python’s compiler can optimize constant expressions.
Code objects store constants and names separately:
code1 = compile('a = 10; print(3 + a)', '<ex1>', 'exec')
code2 = compile('a = 11; print(4 + a)', '<ex2>', 'exec')
print("Same bytecode?", code1.co_code == code2.co_code)
print()
print("Code 1 constants:", code1.co_consts)
print("Code 2 constants:", code2.co_consts)
print()
print("Code 1 names:", code1.co_names)
print("Code 2 names:", code2.co_names)Same bytecode? True
Code 1 constants: (10, 3, None)
Code 2 constants: (11, 4, None)
Code 1 names: ('a', 'print')
Code 2 names: ('a', 'print')
The bytecode is identical! Only the constants differ. This separation makes the bytecode more compact and flexible.
When you import a module, Python saves the compiled bytecode to a .pyc file in the __pycache__ directory. This speeds up subsequent imports - Python can skip tokenization, parsing, and compilation.
Creating .pyc files can take time for large codebases. The first import is slow, but subsequent imports are much faster. This is why:
.pyc files should generally be in .gitignore (they’re machine-generated)You can control .pyc generation with command-line flags:
The -B flag prevents Python from writing .pyc files. Useful when you don’t have write permissions or want to avoid clutter during development.
Or use environment variables:
To customize where .pyc files go:
Bytecode changes between Python minor versions (3.13 vs 3.14), but not between patch versions (3.13.1 vs 3.13.2). That’s why .pyc files include the Python version in their name:
__pycache__/mymodule.cpython-313.pyc
If you run the same code with Python 3.14, you’ll get a new file:
__pycache__/mymodule.cpython-314.pyc
You can examine individual instructions:
bytecode = dis.Bytecode('a = 11; print(4 + a)')
for instr in bytecode:
print(f"{instr.opname:20s} {instr.argval}")RESUME 0
LOAD_CONST 11
STORE_NAME a
LOAD_NAME print
PUSH_NULL None
LOAD_CONST 4
LOAD_NAME a
BINARY_OP 0
CALL 1
POP_TOP None
RETURN_CONST None
Or compare different Python constructs:
0 RESUME 0
1 LOAD_CONST 0 (5)
STORE_NAME 0 (x)
RETURN_CONST 1 (None)
0 RESUME 0
1 LOAD_NAME 0 (x)
LOAD_CONST 0 (5)
BINARY_OP 13 (+=)
STORE_NAME 0 (x)
RETURN_CONST 1 (None)
+= is not syntactic sugar for = + - it uses different bytecode (BINARY_OP with augmented assignment).
Godbolt Compiler Explorer lets you see bytecode interactively. Select “Python” as the language, write code on the left, and see the disassembly on the right.
You can:
Try it with this example:
You’ll see that add_constant pre-computes less than you might expect - the addition still happens at runtime.
Understanding Python’s compilation pipeline helps explain performance characteristics:
.pyc files on subsequent runsPython 3.13 introduced experimental JIT (Just-In-Time) compilation support. See PEP 744 and the official Python 3.13 documentation for details.
Traditional Python:
Source → Tokens → AST → Bytecode → Interpreter
With JIT:
Source → Tokens → AST → Bytecode → [JIT Compiler] → Machine Code
The JIT compiler can:
This is still experimental in Python 3.13/3.14, but represents a major shift in how Python executes code. Future versions may enable JIT by default, dramatically improving performance for CPU-bound code, but as of Python 3.14, the performance gains are very small, at least according to this blogpost.
The JIT compiler is only available when Python is built with the --enable-experimental-jit configuration option. To use the JIT, you’ll need to build Python from source with this flag enabled, or use a distribution that includes JIT support. In python 3.14 there a submodule sys._jit was added.
Why should you care about Python’s internals?
Debugging: Error messages reference these stages
File "<example>", line 1
print(3 + 4
^
SyntaxError: '(' was never closed
The tokenizer caught this before we even got to the AST.
Performance: Understanding bytecode helps optimize - List comprehensions generate cleaner bytecode than loops - Local variables are faster than globals (different bytecode instructions)
Tooling: Modern Python tools work at these levels
Code generation: You can write code that writes code
Python’s journey from source to execution:
.pyc filesEach stage is accessible through Python’s standard library. Experiment, explore, and demystify what happens when you hit “run”.
Write a small Python script, then trace it through all stages:
code = """
def greet(name):
return f"Hello, {name}!"
print(greet("World"))
"""
# Tokenize
from io import StringIO
tokens = list(tokenize.generate_tokens(StringIO(code).readline))
print(f"Token count: {len(tokens)}")
# Parse to AST
tree = ast.parse(code)
print(f"AST nodes: {len(list(ast.walk(tree)))}")
# Compile to bytecode
bytecode = compile(tree, '<example>', 'exec')
print(f"Bytecode length: {len(bytecode.co_code)} bytes")
# Disassemble
dis.dis(bytecode)
# Execute
exec(bytecode)---
title: "Python fundamentals"
subtitle: "How it's made: Python"
author: "Karsten Naert"
date: today
toc: true
execute:
echo: true
output: true
---
# Introduction
Python is a programming language specification. The most common implementation is **CPython**, which comes with the familiar `python.exe` executable on Windows.
When you run `python my_program.py`, CPython processes your code through four distinct stages:
1. **Tokenization** - Breaking source code into meaningful chunks
2. **AST** (Abstract Syntax Tree) - Organizing tokens into a logical structure
3. **Compilation** - Converting the AST into bytecode
4. **Execution** - Running the bytecode
What makes Python interesting is that all of these stages are accessible to you through built-in modules. We'll explore each stage hands-on.
::: {.callout-note icon=true}
## Version matters
The internals we discuss are specific to **CPython 3.13/3.14**. Other Python implementations (PyPy, GraalPy) work differently under the hood. Even between CPython versions, bytecode and internal representations can change significantly.
:::
# Tokenization
The first step is breaking your source code into **tokens** - sequences of characters that have meaning in Python. Think of it like breaking a sentence into words, except for code.
```{python}
import tokenize
from io import StringIO
text = "print('Hello World')"
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(tok.string)
```
Our simple `print` statement consists of 7 tokens. Let's make this clearer:
```{python}
text = "print('Hello World')"
s = StringIO(text)
'|'.join(tok.string for tok in tokenize.generate_tokens(s.readline))
```
Notice that `'Hello World'` (including quotes) is treated as a single token. Each token also has a type:
```{python}
text = "abc + xyz"
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(f"{tok.type:2d} {tokenize.tok_name[tok.type]:10s} {tok.string!r}")
```
Here's a more complex example showing how Python handles function definitions:
```{python}
text = """
def f(x):
return 2 * x
"""
s = StringIO(text)
for tok in tokenize.generate_tokens(s.readline):
print(f"{tokenize.tok_name[tok.type]:10s} {tok.string!r}")
```
Notice the `INDENT` and `DEDENT` tokens - Python's whitespace sensitivity is baked in at the tokenization level.
::: {.callout-note icon=false collapse="true"}
## Exercise
Try tokenizing code with syntax errors:
```{python}
#| eval: false
text = "print('unclosed string"
s = StringIO(text)
list(tokenize.generate_tokens(s.readline))
```
What happens? The tokenizer catches some errors but not all - it doesn't understand code semantics yet.
:::
## Practical uses
Tokenization is useful beyond Python's internal workings:
- **Code analysis**: Find all variable names, detect naming conventions
- **Syntax highlighting**: Colorize code in editors
- **Code formatting tools**: Tools like Black use tokenization to understand code structure
::: {.callout-tip icon=false}
## Quick tool: Variable name finder
```{python}
def find_names(code):
"""Find all variable names in Python code"""
s = StringIO(code)
names = {tok.string for tok in tokenize.generate_tokens(s.readline)
if tok.type == tokenize.NAME and not tok.string in ['def', 'return']}
return names
code = """
def calculate(x, y):
result = x + y
return result
"""
print(find_names(code))
```
:::
::: {.callout-note icon=true}
## Aside: Tokenization in LLMs
Large Language Models also use "tokens", but they're different. Python tokenization splits code based on syntax rules. LLM tokenization (like GPT's) splits text into subword units based on frequency. Similar name, completely different purpose.
:::
# The Abstract Syntax Tree (AST)
Tokens tell us what pieces we have. The **AST** tells us how they fit together.
```{python}
import ast
code = "3 + 5"
tree = ast.parse(code)
print(ast.dump(tree, indent=' '))
```
That's a lot of structure for `3 + 5`! Let's break it down:
- The root is a `Module` (every Python file is a module)
- Inside is an `Expr` (expression statement)
- The expression is a `BinOp` (binary operation)
- It has a left operand (3), an operator (+), and a right operand (5)
Here's a more interesting example:
```{python}
code = "x = 5 + 6 # test"
tree = ast.parse(code)
print(ast.dump(tree, indent=' '))
```
The comment disappeared - the AST only captures code structure, not formatting or comments.
You can go from AST back to source code (since Python 3.9):
```{python}
print(ast.unparse(tree))
```
But you won't get your original code back - just equivalent code. The comment is gone, and whitespace may differ.
## Unpacking assignments
Even simple-looking code can have complex ASTs:
```{python}
code = 'x, *stuff, y = L'
tree = ast.parse(code)
print(ast.dump(tree, indent=' '))
```
The AST reveals how Python interprets the unpacking: `x` and `y` are regular targets, `stuff` is a `Starred` target that collects the rest.
::: {.callout-note icon=false collapse="true"}
## Exercise
Parse these statements and explore their AST structure:
```{python}
#| eval: false
statements = [
"x += 1",
"[i for i in range(10)]",
"lambda x: x + 1",
"def f(a, *args, **kwargs): pass"
]
for stmt in statements:
tree = ast.parse(stmt)
print(f"\n{stmt}")
print(ast.dump(tree, indent=' '))
```
Can you identify the key node types?
:::
## Transforming code
The AST can be modified programmatically. Python provides `ast.NodeTransformer` for this:
```{python}
class AddToMul(ast.NodeTransformer):
"""Convert all additions to multiplications"""
def visit_BinOp(self, node):
if isinstance(node.op, ast.Add):
node.op = ast.Mult()
return node
code = "x = 3 + 4 + 5"
tree = ast.parse(code)
transformed = AddToMul().visit(tree)
print("Original:", ast.unparse(tree))
print("Modified:", ast.unparse(transformed))
```
This is how tools like code formatters and refactoring tools work - they parse, transform, and regenerate code.
::: {.callout-note icon=true}
## Design Patterns: The Visitor Pattern
What we've just seen is an example of the **Visitor pattern** - a design pattern where different classes work together through a carefully orchestrated structure. The `NodeTransformer` is the visitor that "visits" each node in the AST tree, and the `visit_BinOp` method defines what happens when we encounter a binary operation node.
This pattern allows us to add new operations on the AST without modifying the AST node classes themselves. We'll explore design patterns in much greater detail later in the course, but it's important to recognize them when we encounter them in real-world code.
:::
::: {.callout-warning icon=true}
## CST: When formatting matters
The AST is "abstract" - it discards formatting details like comments, whitespace, and exact syntax choices. For tools that need to preserve these (like code formatters), there's an alternative: the **Concrete Syntax Tree**.
The external library [libCST](https://github.com/Instagram/LibCST) provides this. With libCST, you can parse a file and write it back byte-for-byte identical, then make targeted changes while preserving formatting.
:::
**Project idea**: Consider building a **code modernization tool** that automatically updates deprecated API calls across a large codebase. For example, you might want to:
- Replace all `os.path.join()` calls with `pathlib.Path` operations
- Update old-style string formatting (`%s`) to f-strings
- Migrate from `unittest` assertions to `pytest` style
Using CST would preserve all comments, docstrings, and code style choices while making these targeted transformations. This is particularly valuable when working on legacy codebases where maintaining existing formatting and documentation is crucial. The tool could scan a project, identify patterns to modernize, and apply transformations while keeping the code's original structure and style intact - something that would be impossible with AST alone since it discards all formatting information.
# Bytecode: What Python actually runs
The AST still isn't executable. Python compiles it into **bytecode** - a low-level instruction set for the Python virtual machine.
```{python}
code = "print(3 + 4)"
code_object = compile(code, '<example>', 'eval')
```
The `compile()` function takes three arguments:
- The source code (or AST)
- A filename (or `<example>` for inline code)
- A mode: `'eval'` for expressions, `'exec'` for statements
The result is a code object containing bytecode:
```{python}
print(code_object.co_code)
```
These bytes are what Python executes. Let's make them readable with the `dis` module:
```{python}
import dis
dis.dis(code_object)
```
Each line is an instruction:
- `RESUME` - Checkpoints for debugging/tracing
- `LOAD_NAME` - Load a variable (here, `print`)
- `LOAD_CONST` - Load a constant (here, `7`, the result of `3 + 4`)
- `CALL` - Call a function
- `RETURN_VALUE` - Return the result
Notice that `3 + 4` was pre-computed! Python's compiler can optimize constant expressions.
## Constants and names
Code objects store constants and names separately:
```{python}
code1 = compile('a = 10; print(3 + a)', '<ex1>', 'exec')
code2 = compile('a = 11; print(4 + a)', '<ex2>', 'exec')
print("Same bytecode?", code1.co_code == code2.co_code)
print()
print("Code 1 constants:", code1.co_consts)
print("Code 2 constants:", code2.co_consts)
print()
print("Code 1 names:", code1.co_names)
print("Code 2 names:", code2.co_names)
```
The bytecode is identical! Only the constants differ. This separation makes the bytecode more compact and flexible.
## The .pyc files
When you import a module, Python saves the compiled bytecode to a `.pyc` file in the `__pycache__` directory. This speeds up subsequent imports - Python can skip tokenization, parsing, and compilation.
::: {.callout-tip icon=true}
## Performance implications
Creating `.pyc` files can take time for large codebases. The first import is slow, but subsequent imports are much faster. This is why:
- Your app might start slowly the first time after code changes
- `.pyc` files should generally be in `.gitignore` (they're machine-generated)
- Deployment systems sometimes pre-compile to speed up cold starts
:::
You can control `.pyc` generation with command-line flags:
```bash
python -B my_script.py
```
The `-B` flag prevents Python from writing `.pyc` files. Useful when you don't have write permissions or want to avoid clutter during development.
Or use environment variables:
```bash
set PYTHONDONTWRITEBYTECODE=1
python my_script.py
```
To customize where `.pyc` files go:
```bash
set PYTHONPYCACHEPREFIX=C:\temp\pycache
python my_script.py
```
::: {.callout-note icon=true}
## Bytecode versioning
Bytecode changes between Python minor versions (3.13 vs 3.14), but not between patch versions (3.13.1 vs 3.13.2). That's why `.pyc` files include the Python version in their name:
```
__pycache__/mymodule.cpython-313.pyc
```
If you run the same code with Python 3.14, you'll get a new file:
```
__pycache__/mymodule.cpython-314.pyc
```
:::
## Exploring bytecode
You can examine individual instructions:
```{python}
bytecode = dis.Bytecode('a = 11; print(4 + a)')
for instr in bytecode:
print(f"{instr.opname:20s} {instr.argval}")
```
Or compare different Python constructs:
```{python}
dis.dis("x = 5")
print()
dis.dis("x += 5")
```
`+=` is not syntactic sugar for `= +` - it uses different bytecode (`BINARY_OP` with augmented assignment).
::: {.callout-note icon=false}
## Exercise
Compare the bytecode of these equivalent operations:
```{python}
#| eval: false
# List comprehension
dis.dis("[x*2 for x in range(10)]")
# Generator expression
dis.dis("(x*2 for x in range(10))")
# Map function
dis.dis("list(map(lambda x: x*2, range(10)))")
```
Which is most complex? Can you guess which is fastest?
:::
# Visualizing with Godbolt
[Godbolt Compiler Explorer](https://godbolt.org/) lets you see bytecode interactively. Select "Python" as the language, write code on the left, and see the disassembly on the right.
You can:
- Compare different Python versions side by side
- Hover over code to highlight corresponding bytecode
- See how optimizations change bytecode
Try it with this example:
```python
def add(a, b):
return a + b
def add_constant(x):
return x + 42
```
You'll see that `add_constant` pre-computes less than you might expect - the addition still happens at runtime.
::: {.callout-tip icon=true}
## Using Godbolt effectively
1. Start simple - complex code generates lots of bytecode
2. Compare Python versions to see optimizations
3. Look for patterns in how Python handles common constructs
4. Remember: fewer instructions ≠ faster code (but it's often correlated)
:::
# Performance and the future
Understanding Python's compilation pipeline helps explain performance characteristics:
- **Startup time**: Includes tokenization, parsing, and compilation
- **Import time**: Saved by `.pyc` files on subsequent runs
- **Runtime**: Dominated by bytecode execution
## The JIT revolution
Python 3.13 introduced experimental **JIT** (Just-In-Time) compilation support. See [PEP 744](https://peps.python.org/pep-0744/) and the [official Python 3.13 documentation](https://docs.python.org/3/whatsnew/3.13.html#whatsnew313-jit-compiler) for details.
Traditional Python:
```
Source → Tokens → AST → Bytecode → Interpreter
```
With JIT:
```
Source → Tokens → AST → Bytecode → [JIT Compiler] → Machine Code
```
The JIT compiler can:
- Detect hot code paths (frequently executed code)
- Compile bytecode to native machine code
- Optimize based on runtime behavior
This is still experimental in Python 3.13/3.14, but represents a major shift in how Python executes code. Future versions may enable JIT by default, dramatically improving performance for CPU-bound code, but as of Python 3.14, the performance gains are very small, at least according to [this blogpost](https://blog.miguelgrinberg.com/post/python-3-14-is-here-how-fast-is-it).
::: {.callout-warning icon=true}
## JIT Availability
The JIT compiler is only available when Python is built with the `--enable-experimental-jit` configuration option. To use the JIT, you'll need to build Python from source with this flag enabled, or use a distribution that includes JIT support. In python 3.14 there a submodule `sys._jit` was [added](https://docs.python.org/3/library/sys.html#sys._jit).
:::
::: {.callout-note icon=true}
## Enabling JIT (Python 3.13+)
If you have a JIT-enabled Python build:
```bash
set PYTHON_JIT=1
python my_script.py
```
:::
# Practical implications
Why should you care about Python's internals?
**Debugging**: Error messages reference these stages
```
File "<example>", line 1
print(3 + 4
^
SyntaxError: '(' was never closed
```
The tokenizer caught this before we even got to the AST.
**Performance**: Understanding bytecode helps optimize
- List comprehensions generate cleaner bytecode than loops
- Local variables are faster than globals (different bytecode instructions)
**Tooling**: Modern Python tools work at these levels
- **Black** (formatter): Works with tokens and AST
- **MyPy** (type checker): Analyzes AST
- **Coverage.py**: Tracks bytecode execution
**Code generation**: You can write code that writes code
- Generate optimized functions at runtime
- Create DSLs (Domain-Specific Languages)
- Build advanced decorators and metaprogramming tools
# Summary
Python's journey from source to execution:
1. **Tokenization**: Source code → Tokens
- Breaks code into meaningful pieces
- Catches basic syntax errors
2. **AST**: Tokens → Tree structure
- Represents code logic
- Enables code analysis and transformation
3. **Compilation**: AST → Bytecode
- Generates platform-independent instructions
- Cached in `.pyc` files
4. **Execution**: Bytecode → Results
- Interpreted by Python VM
- (Future: JIT compilation to machine code)
Each stage is accessible through Python's standard library. Experiment, explore, and demystify what happens when you hit "run".
::: {.callout-note icon=false collapse="true"}
## Final exercise: Full pipeline
Write a small Python script, then trace it through all stages:
```{python}
#| eval: false
code = """
def greet(name):
return f"Hello, {name}!"
print(greet("World"))
"""
# Tokenize
from io import StringIO
tokens = list(tokenize.generate_tokens(StringIO(code).readline))
print(f"Token count: {len(tokens)}")
# Parse to AST
tree = ast.parse(code)
print(f"AST nodes: {len(list(ast.walk(tree)))}")
# Compile to bytecode
bytecode = compile(tree, '<example>', 'exec')
print(f"Bytecode length: {len(bytecode.co_code)} bytes")
# Disassemble
dis.dis(bytecode)
# Execute
exec(bytecode)
```
:::
# Additional resources
- [Python AST Documentation](https://docs.python.org/3/library/ast.html)
- [dis module - Disassembler](https://docs.python.org/3/library/dis.html)
- [PEP 744 - JIT Compilation](https://peps.python.org/pep-0744/)
- [Godbolt Compiler Explorer](https://godbolt.org/)
- [libCST - Concrete Syntax Tree](https://libcst.readthedocs.io/)
- [Green Tree Snakes - AST Tutorial](https://greentreesnakes.readthedocs.io/)