Distributions and packaging
Shipping your creations
Why Packaging Matters
The dependency problem
You’ve built something brilliant. It runs perfectly on your machine. Then you share it with a colleague, and… nothing works.
This is the “it works on my machine” syndrome, and it’s caused by differences in:
- Python versions
- Installed packages and their versions
- Operating system libraries
- Environment variables
In data science and software engineering, reproducibility matters. Your analysis should produce the same results whether you run it today or six months from now, on your laptop or on a cloud server.
The Python packaging landscape
The Python packaging ecosystem has been… complicated. But there’s good news: dramatic improvements happened in 2023-2024.
Some key players:
- PyPI: The Python Package Index, where packages live
- uv: Modern, fast tool (we’ll focus on this)
- pip: The traditional package installer
- conda: Package and environment manager for scientific computing
- Poetry: All-in-one dependency management tool
Terminology clarification
Before we dive in, let’s get the vocabulary straight:
- Module: A python object of type
Module, usually created from a file - Package: A python object of tyep
Modulewith a__path__attribute, usually created from a directory with an__init__.pyfile. - Distribution: A bundled version of a package ready for installation (e.g., a
.whlfile) What this lecture is about - Source distribution (sdist): Distribution containing source code (
.tar.gzor.zip) - Built distribution (wheel): Pre-built binary distribution (
.whl)
Python Packaging Fundamentals
PyPI: The Python Package Index
PyPI (pypi.org) is the central repository for Python packages. Think of it as the app store for Python code.
When you browse a package page, you’ll find:
- Documentation and README
- Available versions
- Dependencies
- Download statistics
- Project links (homepage, repository, issue tracker)
Distributions contain arbitrary code you are going to run on your machine! Verifying that the project looks trustworthy is not a luxury but a must!
Distribution formats
Source distributions (sdist)
A source distribution is a .tar.gz or .zip file containing:
- Source code
pyproject.toml(configuration)- README, LICENSE, and other metadata
When you install from a source distribution, Python must:
- Download the source
- Install build tools
- Compile any extensions
- Create the package
This is portable but slower, and requires build tools on the target machine.
Wheels: The modern binary format
Wheels (.whl files) are pre-built distributions. They’re just ZIP files with a specific structure.
Benefits:
- Fast installation: Just unzip and place files
- No compiler needed: Binaries are pre-compiled
- Consistent: Same build used everywhere
Wheels come in two flavors:
- Pure Python wheels: Work on any platform (
*-py3-none-any.whl) - Platform-specific wheels: Compiled for specific OS/architecture
The wheel filename tells you everything:
numpy-1.24.0-cp311-cp311-win_amd64.whl
│ │ │ │ └─ Platform (Windows, 64-bit)
│ │ │ └─ ABI (CPython 3.11)
│ │ └─ Python version (3.11)
│ └─ Package version
└─ Package name
The build system architecture
Modern Python packaging separates concerns:
- Build frontends (pip, uv, build): Orchestrate the process
- Build backends (setuptools, hatchling, flit): Actually create distributions
Think of it this way:
- Frontend = General contractor who manages the project
- Backend = Construction crew who does the actual work
This separation, defined by PEP 517/518, means you can mix and match tools. The frontend reads your pyproject.toml, sets up an isolated environment, installs the backend, and asks it to build your package.
The [build-system] section in pyproject.toml specifies which backend to use:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"Virtual environments: Isolation is key
Virtual environments solve the conflict problem. Each project gets its own isolated Python + packages directory.
Structure:
.venv/
├── Scripts/ # Executables (Windows)
│ ├── python.exe
│ ├── pip.exe
│ └── activate.bat
├── Lib/
│ └── site-packages/ # Installed packages
└── pyvenv.cfg # Configuration
When you “activate” a virtual environment, it modifies your PATH so that python points to the environment’s Python.
Modern tools like uv create and manage virtual environments automatically, so you rarely need to think about this, especially when they are well integrated with your editor.
Security considerations
Dependencies can have vulnerabilities or even be malicious. The fact that packages are open source means you can look at their source, it doesn’t mean someone has looked at the source!
Best practices:
- Use lockfiles:
uv.lockincludes hashes - Vet dependencies: Check before adding to your project
- Keep updated: Security patches matter
- Use audit tools:
pip-audit(works with uv too)
Example:
pip install pip-audit
pip-auditSupply chain security in the the Python ecosystem is still rather limited. Be extra cautious about what you install. Double check the packages for their correct names since there are many reported incidents of typosquatting where hackers have uploaded malicious python packages with names that correspond to frequently made typos.
Modern Packaging with uv
What is uv?
uv is from Astral (the team behind Ruff), written in Rust. It’s an all-in-one tool:
- Package installer (replaces pip)
- Environment manager (replaces venv)
- Python version manager (replaces pyenv)
It’s 10-100x faster than pip while being fully compatible with the existing ecosystem.
Why we’re teaching it first: It’s the future of Python packaging.
Fun fact: The name “uv” doesn’t stand for anything! It’s just short, fast to type, and suggests speed (like UV light at the fast end of the spectrum). The logo features a lightning bolt ⚡
We are assuming you already have uv installed. There are different ways of using uv but for our purposes we focus on the project workflow.
Understanding uv’s project workflow
Three key files:
- pyproject.toml: Your project configuration (dependencies, metadata)
- uv.lock: Exact versions of everything (commit this!)
- .python-version: Pinned Python version
Typical workflow:
uv initto start a new projectuv add package_nameas you need dependenciesuv syncto install everything- Commit
pyproject.toml,uv.lock, and.python-versionto Git
Development dependencies
Some packages are only needed during development (testing, linting):
uv add --dev pytest ruff mypyIn pyproject.toml, these appear in [dependency-groups]:
[dependency-groups]
dev = ["pytest>=7.0", "ruff>=0.1.0", "mypy>=1.0"]Install everything including dev dependencies:
uv sync --devRun dev tools:
uv run pytest
uv run ruff checkuv and requirements.txt
uv maintains backward compatibility with traditional workflows. For instance, rather than a uv.lock file, people will sometimes have a requirements.txt file which tells pip what dependencies to install. It’s possible to generated such a file with uv as follows:
uv pip compile pyproject.toml -o requirements.txtWe can install the dependencies from the file as follows:
uv pip install -r requirements.txtThis makes migration easy—you can adopt uv incrementally.
Building and Distributing Packages
Package structure
The modern best practice is the src/ layout:
myproject/
├── src/
│ └── mypackage/
│ ├── __init__.py
│ └── module.py
├── tests/
│ └── test_module.py
├── pyproject.toml
├── README.md
├── LICENSE
└── .gitignore
Why src/? If you add myproject/src/ to your python path, python can only find your package. If you add myproject/ it might also discover tests even though it’s not really a package.
Essential files:
pyproject.toml: ConfigurationREADME.md: DocumentationLICENSE: Legal requirements.gitignore: Version control hygiene
In many distributions you will also find a setup.py file. This dates from before we had a standardized pyproject.toml and contains instructions for setuptools, a specific distributions builder tool.
pyproject.toml: The configuration file
This is the modern standard for Python projects (PEP 517/518/621):
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "mypackage"
version = "0.1.0"
description = "A short description"
authors = [
{name = "Albert Einstain", email = "[email protected]"}
]
dependencies = [
"requests>=2.28.0",
]
requires-python = ">=3.10"
[dependency-groups]
dev = [
"pytest>=7.0",
"ruff>=0.1.0",
]
[project.scripts]
mytool = "mypackage.cli:main"Key sections:
[build-system]: Specifies the build backend[project]: Name, version, dependencies, authors[dependency-groups]: Development dependencies (uv standard)[project.scripts]: Command-line entry points[tool.*]: Tool-specific configuration (tools like ruff, pyright, …)
Build backends: Under the hood
The build backend transforms your source code into distributions.
Some popular choices you will encounter in the wild:
hatchling (recommended for new projects):
- Modern and simple
- Sensible defaults
- Minimal configuration
setuptools (traditional):
- Most compatible
- Most packages still use it
- Very flexible but more complex
flit-core (minimal):
- For pure-Python packages
- Extremely simple
meson-python:
- For packages with C/C++ extensions
Building your package
With uv, building is simple:
uv buildThis creates a dist/ directory with:
- A source distribution (
*.tar.gz) - A wheel (
*.whl)
Behind the scenes, uv:
- Reads your
[build-system]configuration - Creates an isolated environment
- Installs the build requirements
- Calls the backend’s build functions
- Generates the distributions
Inspect the wheel:
python -m zipfile -l dist\mypackage-0.1.0-py3-none-any.whlPublishing to PyPI
Prerequisites:
- PyPI account at pypi.org
- API token (Settings → API tokens)
For practice, use TestPyPI first: test.pypi.org
Install twine (the upload tool):
uv pip install twineCheck your distributions:
twine check dist/*Upload to TestPyPI:
twine upload --repository testpypi dist/*Upload to PyPI:
twine upload dist/*Use API tokens, not passwords! Set up a token in your PyPI account settings and use it when prompted for credentials.
Best practices:
- Follow semantic versioning (MAJOR.MINOR.PATCH)
- Tag releases in Git
- Write changelogs
- Test on TestPyPI first
Entry points and CLI scripts
Make your package executable from the command line:
[project.scripts]
mytool = "mypackage.cli:main"This creates a mytool command that calls the main() function in mypackage.cli.
Example module:
# src/mypackage/cli.py
def main():
print("Hello from mytool!")
if __name__ == "__main__":
main()After installing the package, users can run:
mytoolVersion management
Static version (hard-coded):
[project]
version = "0.1.0"Dynamic version (from file):
[project]
dynamic = ["version"]
[tool.hatchling.version]
path = "src/mypackage/__init__.py"Then in __init__.py:
__version__ = "0.1.0"Best practice: Single source of truth for version numbers.
Application Deployment Strategies
The deployment challenge
Distributing libraries (for other developers) is different from deploying applications (for end users).
Goals for application deployment:
- Reproducible execution
- Standalone (users don’t manage dependencies)
- Easy to run
Options range from simple (lockfiles) to complex (containers, compiled executables).
Simple: Lockfiles and virtual environments
Best for: Servers, cloud deployments, development teams
Ship:
uv.lockorrequirements.txtpyproject.toml- Your code
Deploy on target machine:
uv syncor
uv pip install -r requirements.txtFast, simple, reproducible.
Freezing applications with PyInstaller
Turn your Python app into a standalone executable.
Install:
uv pip install pyinstallerBasic usage:
pyinstaller script.pySingle-file executable:
pyinstaller --onefile script.pyOutput appears in dist/ directory.
Limitations:
- Large file sizes
- Antivirus false positives
- Platform-specific (build on Windows for Windows)
Containerization with Docker
Containers solve “works on my machine” completely. They package your app with its entire runtime environment.
Example Dockerfile with uv:
FROM python:3.12-slim
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Set working directory
WORKDIR /app
# Copy dependency files
COPY pyproject.toml uv.lock ./
# Install dependencies
RUN uv sync --frozen --no-dev
# Copy application code
COPY . .
# Run application
CMD ["uv", "run", "python", "-m", "myapp"]Multi-stage build for smaller images:
# Build stage
FROM python:3.12-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
# Runtime stage
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
COPY . .
ENV PATH="/app/.venv/bin:$PATH"
CMD ["python", "-m", "myapp"]Build and run:
docker build -t myapp .
docker run myappPlatform-as-a-Service (PaaS)
Services like Heroku, Railway, Render, and Fly.io automatically detect Python projects.
Typically they:
- Detect
requirements.txtorpyproject.toml - Install dependencies
- Run your app
Configuration usually involves:
Procfile(specifies how to run)- Environment variables
- buildpack settings
Modern platforms support uv—just specify it in your buildpack configuration.
Serverless deployment
AWS Lambda, Google Cloud Functions, Azure Functions let you run code without managing servers.
Considerations:
- Cold start latency
- Execution time limits (typically 15 minutes max)
- Package size limits
Packaging for Lambda:
- Create environment with uv
- Zip the contents
- Upload to Lambda
Or use container images:
FROM public.ecr.aws/lambda/python:3.12
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
COPY . .
CMD ["myapp.handler"]Best for: Event-driven workloads, sporadic usage
Alternative Tools and Workflows
Some other tools and ecosystems are worth knowing about.
The pip ecosystem
pip: The standard package installer, built into Python.
pip install package
pip uninstall package
pip list
pip show package
pip freeze > requirements.txtImproved dramatically in 2020 with a new dependency resolver, but still slower than uv.
pip-tools: Adds reproducibility to pip.
pip install pip-toolsWorkflow:
- Write
requirements.inwith loose constraints - Run
pip-compileto generaterequirements.txtwith exact versions - Use
pip-syncto install exactly what’s inrequirements.txt
venv: Standard library for virtual environments.
python -m venv myenv
myenv\Scripts\activateWorks but less convenient than uv.
When to use: Legacy systems, maximum compatibility, minimal dependencies.
Poetry: The all-in-one alternative
Poetry handles dependency management, building, and publishing.
Install:
pip install poetryWorkflow:
poetry new myproject
cd myproject
poetry add requests
poetry install
poetry build
poetry publishPros:
- Excellent user experience
- Mature ecosystem
- Popular in web development
Cons:
- Slower than uv
- Occasional dependency resolution issues
- Opinionated (can’t easily mix with other tools)
poetry.lock ensures reproducible installs, similar to uv.lock.
PDM: Standards-compliant alternative
Similar workflow to Poetry but more flexible:
pdm init
pdm add requests
pdm installPDM supports PEP 582 __pypackages__ (experimental local package directory). However PEP 582 has since been rejected and PDM itself recommends using vertual environments.
Pros:
- Fast
- Standards-compliant
- Flexible
Cons:
- Smaller community than Poetry
The Conda Ecosystem
conda is both a package manager AND an environment manager. Unlike pip, which only handles Python packages, conda can manage:
- Python packages
- System libraries (C/C++, CUDA)
- R packages
- Julia packages
It was born to solve the “NumPy/SciPy installation nightmare” of the pre-wheel era and gained massive popularity especially in the data science community.
Key terminology:
- Anaconda: Full distribution (3GB+, has commercial restrictions)
- Miniconda: Minimal installer (just conda + Python, also has commercial restrictions)
- Miniforge: Community-driven, uses conda-forge by default (recommended)
- conda-forge: Community-maintained package channel
For today’s purposes we are just going to recommend not using conda as a software developer. I have many concerns about misalignment between Anaconda, the company behind this ecosystem, and what developers need. Furthermore, there are many licensing issues and Anaconda is currently suing companies like Intel and Dell for misusing their software. The current state of affairs seems to be like surfing to the New York Times website, reading a few articles, and later getting sued because you should have payed a subscription. Exactly what is allowed and isn’t is complicated enough that consulting with a lawyer is not a luxury here in my opinion.
If you’re interested in using the conda ecosystem anyhow, perhaps also have a look at pixi. It’s a company started by Wolf Vollprecht, the person who made conda fast(er), despite not even working at the company Anaconda!
My opinion
I have never used poetry or pdm myself, so here I will just parrot what I hear from other people but I don’t personally see them being used much more in the future with uv being out there.
For conda it’s a bit different since there is still a niche of packages that require conda for a fluent install experience, but it seems that with each year that passes that niche is getting smaller. Furthermore conda, has been focused on things like getting “Python in Excel” – by running things in the cloud, while their whole raison d’être has been getting hollowed out.
Advanced Topics
Version constraints
The python packaging authority maintains a page on valid version specifiers.
Specify dependencies with constraint operators:
==1.2.3: Exact version (rarely use this)>=1.2.0: Minimum version<2.0.0: Upper bound~=1.2: Compatible release (≥1.2, <2.0)!=1.2.5: Exclude specific version>=1.2,<2.0: Multiple constraints
Example:
dependencies = [
"requests>=2.28.0,<3.0.0",
"numpy~=1.24",
]Libraries should be permissive: Use >= constraints to maximize compatibility.
Applications should be strict: Use lock files to pin exact versions.
Debugging installation issues
It will sometimes happen that installing a package fails. A reason might be that the package is only available for Unix systems but you’re on windows.
uv pip install -v package_nameCommon errors:
- “Package not found”: Check spelling, verify it exists on PyPI
- “No matching distribution”: Platform issue, may need build tools
- Version conflicts: Read the error message carefully
Working with private packages
Options:
- Private PyPI server (devpi, Artifactory)
- Cloud services (AWS CodeArtifact, Google Artifact Registry)
Configure uv:
uv pip install --index-url https://pypi.company.com/simple packageOr in pyproject.toml:
[[tool.uv.index]]
name = "private"
url = "https://pypi.company.com/simple"Authentication via API tokens in environment variables or config files.
Editable installs
Install a package in development mode:
uv pip install -e .The -e flag creates a link to your source directory instead of copying files. Changes to your code are immediately reflected without reinstalling.
Uses:
- Active development
- Testing changes immediately
- Working on multiple related packages
Modern approach uses PEP 660 (better than the legacy method).
Cross-platform considerations
uv.lock works across Windows, macOS, and Linux.
Platform-specific dependencies use markers:
dependencies = [
"pywin32>=300; sys_platform == 'win32'",
"python-magic>=0.4.27; sys_platform != 'win32'",
]Test on multiple platforms using CI/CD (GitHub Actions with matrix builds).
Path handling: Use pathlib, not string manipulation:
from pathlib import Path
config = Path("config") / "settings.json" # Works everywherePerformance tips
uv is already fast, but here are some tips:
- Caching: uv caches downloads globally (shared across projects)
- Cache location:
%LOCALAPPDATA%\uv\cache(Windows) - Clear cache:
uv cache clean(rarely needed) - Parallel operations: uv downloads and installs in parallel automatically
Network optimization: Use mirrors if PyPI is slow in your region.
Resources
Official Documentation
- uv documentation
- Python Packaging User Guide
- PyPI - The Python Package Index
- Important PEPs:
- PEP 517: Build System Interface
- PEP 518: Build System Requirements
- PEP 621: Project Metadata
- PEP 660: Editable Installs