Day 7 — Setting Up Your AI Engineering Environment

“A craftsman is only as good as their tools — and their discipline in maintaining them. Your development environment is the foundation of everything you build. Get it right once, and it pays dividends for the entire journey.”

Why This Day Matters

Most tutorials show you how to write AI code. Very few show you how to set up the environment that lets you write AI code safely, efficiently, and in a way that scales from prototype to production.

This is the day that prevents the common disasters: leaked API keys, broken dependency versions, environment conflicts, and “works on my machine” debugging sessions. Done right once, your environment serves you for years.

Part 1: Python Environment Management

Why Not Just Use System Python?

System Python (the Python that comes with your OS) is a trap:

Version is often outdated
No isolation between projects
pip install pollutes global packages
Breaking one project can break all projects

The correct approach: version-managed Python with isolated virtual environments per project.

pyenv: Python Version Management

pyenv lets you install and switch between multiple Python versions without touching your system Python.

# Install pyenv (macOS)
brew install pyenv

# Add to shell config (~/.zshrc or ~/.bashrc)
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc# Reload shell
source ~/.zshrc# Install Python versions
pyenv install 3.12.3
pyenv install 3.11.9# Set global default
pyenv global 3.12.3# Set project-specific version (creates .python-version file)
cd my-project
pyenv local 3.12.3# Verify
python --version  # Python 3.12.3

uv: The Modern Python Package Manager

uv is a drop-in replacement for pip that is 10-100× faster. Written in Rust by Astral (the team behind ruff). In 2026, it’s rapidly becoming the standard.

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a project with a specific Python version
uv init my-ai-project --python 3.12cd my-ai-project# Add dependencies (creates pyproject.toml and uv.lock)
uv add openai anthropic langchain langchain-openai
uv add python-dotenv pydantic fastapi uvicorn
uv add --dev pytest ruff mypy pre-commit# Create and activate virtual environment
uv venv
source .venv/bin/activate  # macOS/Linux
# or .venv\Scripts\activate  # Windows# Run a script (auto-activates venv)
uv run python script.py# Sync all dependencies (from uv.lock — reproducible)
uv sync

Why uv over conda/pip for AI projects:

100× faster than pip (no more 5-minute install sessions)
Deterministic installs via lockfile
Works with existing pyproject.toml standard
Better dependency resolution (avoids conflicts)

conda: When You Still Need It

Use conda when you need CUDA toolkit management or packages not available on PyPI:

# Install Miniforge (conda without commercial restrictions)
brew install miniforge  # macOS
# or download from conda-forge.org

# Create environment for GPU-based training
conda create -n ai-training python=3.12 -y
conda activate ai-training# Install CUDA-dependent packages via conda
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers accelerate bitsandbytes peft trl# Export environment for reproducibility
conda env export > environment.yml
conda env create -f environment.yml  # Recreate anywhere

Part 2: Project Structure for AI Applications

A well-structured AI project is readable, maintainable, and deployable. Here’s the structure I use for every production AI application:

my-ai-app/
├── .env                        # Local secrets (never commit)
├── .env.example                # Template with dummy values (commit this)
├── .gitignore                  # Include .env, __pycache__, .venv, etc.
├── .python-version             # pyenv Python version pin
├── pyproject.toml              # Dependencies and tool config
├── uv.lock                     # Locked dependency versions
├── README.md                   # Project overview and setup instructions
│
├── src/
│   └── my_ai_app/
│       ├── __init__.py
│       ├── config.py           # Configuration management (Pydantic Settings)
│       ├── models/             # Pydantic models for data structures
│       │   ├── __init__.py
│       │   └── schemas.py
│       ├── services/           # Business logic and AI integrations
│       │   ├── __init__.py
│       │   ├── llm.py          # LLM client wrapper
│       │   ├── embeddings.py   # Embedding service
│       │   └── rag.py          # RAG pipeline
│       ├── prompts/            # Prompt templates (versioned)
│       │   ├── __init__.py
│       │   ├── system_prompts.py
│       │   └── templates/
│       │       ├── qa.txt
│       │       └── summarize.txt
│       └── api/                # API layer (if building an API)
│           ├── __init__.py
│           ├── main.py
│           └── routes/
│
├── tests/
│   ├── conftest.py             # Test fixtures
│   ├── unit/
│   │   └── test_llm.py
│   └── integration/
│       └── test_rag_pipeline.py
│
├── notebooks/                  # Exploration and prototyping
│   ├── 01_data_exploration.ipynb
│   ├── 02_embedding_experiments.ipynb
│   └── 03_prompt_tuning.ipynb
│
├── scripts/                    # One-off scripts (data loading, indexing)
│   ├── ingest_documents.py
│   └── evaluate_model.py
│
├── data/
│   ├── raw/                    # Original, immutable data
│   ├── processed/              # Cleaned and processed data
│   └── eval/                   # Evaluation datasets
│
└── docker/
    ├── Dockerfile
    └── docker-compose.yml

Part 3: Configuration and Secrets Management

The most common and most dangerous mistake in AI development: hardcoding API keys.

The Right Pattern: Pydantic Settings

# src/my_ai_app/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import SecretStr, field_validator
from functools import lru_cache

class Settings(BaseSettings):
    """
    Application configuration loaded from environment variables.
    
    Pydantic automatically reads from:
    1. Environment variables
    2. .env file (if load_dotenv is configured)
    3. .env.{environment} files
    
    SecretStr types are masked in logs and repr output.
    """
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False
    )
    
    # API Keys - always use SecretStr
    openai_api_key: SecretStr
    anthropic_api_key: SecretStr
    google_api_key: SecretStr | None = None
    
    # Model defaults
    default_model: str = "gpt-4o"
    embedding_model: str = "text-embedding-3-small"
    
    # Infrastructure
    redis_url: str = "redis://localhost:6379"
    vector_db_url: str = "http://localhost:6333"
    
    # Application settings
    max_tokens: int = 2048
    temperature: float = 0.7
    debug: bool = False
    environment: str = "development"  # development, staging, production
    
    @field_validator("environment")
    @classmethod
    def validate_environment(cls, v):
        allowed = {"development", "staging", "production"}
        if v not in allowed:
            raise ValueError(f"environment must be one of {allowed}")
        return v
    
    @property
    def is_production(self) -> bool:
        return self.environment == "production"
@lru_cache
def get_settings() -> Settings:
    """
    Cache settings to avoid reading .env file on every request.
    Use lru_cache so settings are loaded once and reused.
    """
    return Settings()
# Usage pattern - import get_settings, not Settings directly
# This enables dependency injection in FastAPI
settings = get_settings()
# API keys accessed via get_secret_value() - never stored as plain string
api_key = settings.openai_api_key.get_secret_value()

The .env File

# .env (NEVER commit this file)
OPENAI_API_KEY=sk-proj-abc123...
ANTHROPIC_API_KEY=sk-ant-def456...
GOOGLE_API_KEY=AIza...

# Model settings
DEFAULT_MODEL=gpt-4o
EMBEDDING_MODEL=text-embedding-3-small
# Infrastructure
REDIS_URL=redis://localhost:6379
VECTOR_DB_URL=http://localhost:6333
# App settings
DEBUG=true
ENVIRONMENT=development
MAX_TOKENS=2048
TEMPERATURE=0.7

# .env.example (COMMIT this file — real values replaced with placeholders)
OPENAI_API_KEY=sk-proj-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=your-google-api-key

DEFAULT_MODEL=gpt-4o
EMBEDDING_MODEL=text-embedding-3-small
REDIS_URL=redis://localhost:6379
VECTOR_DB_URL=http://localhost:6333
DEBUG=false
ENVIRONMENT=development
MAX_TOKENS=2048
TEMPERATURE=0.7

Multi-Environment Configuration

.env                # Local development (gitignored)
.env.example        # Template (committed)
.env.staging        # Staging overrides (gitignored)
.env.production     # Production overrides (use secrets manager instead)

For production: Never use .env files in production containers. Use:

AWS Secrets Manager / Parameter Store
GCP Secret Manager
Azure Key Vault
HashiCorp Vault
Kubernetes Secrets

Part 4: The .gitignore for AI Projects

This is one of the most important files in your project. A single missed API key pushed to GitHub can result in thousands of dollars in charges within hours.

# .gitignore for AI engineering projects

# Environment files (CRITICAL - API keys live here)
.env
.env.*
!.env.example       # Exception: commit the template
# Python
__pycache__/
*.py[cod]
*$py.class
*.pyc
.Python
.venv/
venv/
env/
ENV/
# Jupyter
.ipynb_checkpoints/
*/.ipynb_checkpoints/*
# Data (often large - use DVC or cloud storage instead)
data/raw/
data/processed/
*.csv
*.parquet
*.jsonl
# Model weights (too large for git - use Hugging Face Hub)
*.bin
*.safetensors
*.gguf
*.pt
*.pth
models/
# Vector database data
chroma_db/
qdrant_storage/
pinecone_cache/
# IDE
.vscode/settings.json   # Exclude personal settings
.idea/
*.swp
# OS
.DS_Store
Thumbs.db
# Build
dist/
build/
*.egg-info/
# Testing
.pytest_cache/
.coverage
htmlcov/
# Logs (don't commit logs - use structured logging to a log service)
*.log
logs/

Pre-commit Hook: Your Last Line of Defense

# Install pre-commit
pip install pre-commit

# Create .pre-commit-config.yaml

# .pre-commit-config.yaml
repos:
  # Detect secrets before they're committed
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks
  
  # Detect hardcoded secrets (alternative/complement)
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
  
  # Code quality
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.4.0
    hooks:
      - id: ruff           # Linting
      - id: ruff-format    # Formatting (replaces black)
  
  # Type checking
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.10.0
    hooks:
      - id: mypy
        additional_dependencies: [types-requests]
  
  # Standard file hygiene
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-json
      - id: check-added-large-files
        args: ['--maxkb=500']  # Block files larger than 500KB
      - id: no-commit-to-branch
        args: ['--branch', 'main', '--branch', 'production']

# Initialize pre-commit
pre-commit install  # Installs hooks in .git/hooks/
pre-commit run --all-files  # Run on all files once

Now every git commit automatically runs these checks. Secrets are caught before they ever reach GitHub.

Part 5: VS Code Configuration for AI Development

Essential Extensions

// Install via VS Code Extensions panel or CLI:
// code --install-extension <extension-id>

{
  "ai_engineering_extensions": [
    "ms-python.python",              // Python language support
    "ms-python.vscode-pylance",      // Fast Python type checking
    "charliermarsh.ruff",            // Fast Python linting (replaces pylint)
    "ms-toolsai.jupyter",            // Jupyter notebook support
    "ms-toolsai.jupyter-keymap",     // Jupyter keybindings
    "GitHub.copilot",                // AI pair programming
    "GitHub.copilot-chat",           // AI chat for code
    "donjayamanne.githistory",       // Git log visualization
    "eamodio.gitlens",               // Enhanced Git visualization
    "mikestead.dotenv",              // .env file highlighting
    "redhat.vscode-yaml",            // YAML support
    "ms-azuretools.vscode-docker",   // Docker support
    "humao.rest-client",             // HTTP request testing (like Postman)
    "mechatroner.rainbow-csv"        // CSV visualization
  ]
}

VS Code Settings for AI Development

// .vscode/settings.json (commit this — shared team settings)
{
  "editor.formatOnSave": true,
  "editor.codeActionsOnSave": {
    "source.fixAll.ruff": "explicit",
    "source.organizeImports.ruff": "explicit"
  },
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff"
  },
  "python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
  "python.terminal.activateEnvironment": true,
  
  // Jupyter settings
  "jupyter.defaultKernel": "Python 3 (.venv)",
  "jupyter.askForKernelRestart": false,
  
  // Hide clutter in explorer
  "files.exclude": {
    "**/__pycache__": true,
    "**/*.pyc": true,
    "**/.pytest_cache": true,
    "**/.ruff_cache": true,
    "**/.mypy_cache": true,
    ".venv": true
  },
  
  // Terminal
  "terminal.integrated.defaultProfile.osx": "zsh",
  "terminal.integrated.env.osx": {
    "PYTHONPATH": "${workspaceFolder}/src"
  }
}

Part 6: The Complete LLM Client Wrapper

Every AI project should have a centralized LLM client that handles retries, logging, cost tracking, and error handling. This is the component most tutorials skip — and most production bugs originate from.

# src/my_ai_app/services/llm.py
import time
import logging
from typing import Iterator
from openai import OpenAI, APIError, RateLimitError, APIConnectionError
from anthropic import Anthropic
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)
from my_ai_app.config import get_settings

logger = logging.getLogger(__name__)
# Cost per 1M tokens (update as pricing changes)
MODEL_COSTS = {
    "gpt-4o": {"input": 2.50, "output": 10.00},
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
    "gpt-4o-2024-11-20": {"input": 2.50, "output": 10.00},
    "claude-sonnet-4-6": {"input": 3.00, "output": 15.00},
    "claude-haiku-4-5-20251001": {"input": 0.25, "output": 1.25},
}
class LLMClient:
    """
    Production-grade LLM client with:
    - Automatic retries with exponential backoff
    - Cost tracking per call
    - Structured logging
    - Error handling and categorization
    - Streaming support
    """
    
    def __init__(self):
        settings = get_settings()
        self.openai = OpenAI(
            api_key=settings.openai_api_key.get_secret_value()
        )
        self.anthropic = Anthropic(
            api_key=settings.anthropic_api_key.get_secret_value()
        )
        self.settings = settings
        self._total_cost = 0.0
        self._total_calls = 0
    
    def _calculate_cost(
        self,
        model: str,
        input_tokens: int,
        output_tokens: int
    ) -> float:
        """Calculate cost in USD for a model call."""
        if model not in MODEL_COSTS:
            return 0.0
        costs = MODEL_COSTS[model]
        return (
            (input_tokens / 1_000_000) * costs["input"] +
            (output_tokens / 1_000_000) * costs["output"]
        )
    
    @retry(
        retry=retry_if_exception_type((RateLimitError, APIConnectionError)),
        wait=wait_exponential(multiplier=1, min=4, max=60),
        stop=stop_after_attempt(3)
    )
    def chat(
        self,
        messages: list[dict],
        model: str | None = None,
        system: str | None = None,
        temperature: float | None = None,
        max_tokens: int | None = None,
        **kwargs
    ) -> dict:
        """
        Unified chat interface for OpenAI and Anthropic models.
        
        Returns dict with: content, model, input_tokens, output_tokens, cost_usd
        """
        model = model or self.settings.default_model
        temperature = temperature if temperature is not None else self.settings.temperature
        max_tokens = max_tokens or self.settings.max_tokens
        
        start_time = time.time()
        
        try:
            if "claude" in model:
                result = self._call_anthropic(
                    messages, model, system, temperature, max_tokens, **kwargs
                )
            else:
                result = self._call_openai(
                    messages, model, system, temperature, max_tokens, **kwargs
                )
            
            latency_ms = round((time.time() - start_time) * 1000)
            cost = self._calculate_cost(
                model,
                result["input_tokens"],
                result["output_tokens"]
            )
            
            self._total_cost += cost
            self._total_calls += 1
            
            logger.info(
                "LLM call",
                extra={
                    "model": model,
                    "input_tokens": result["input_tokens"],
                    "output_tokens": result["output_tokens"],
                    "latency_ms": latency_ms,
                    "cost_usd": round(cost, 6)
                }
            )
            
            return {**result, "cost_usd": round(cost, 6), "latency_ms": latency_ms}
        
        except RateLimitError as e:
            logger.warning(f"Rate limit hit for {model}: {e}")
            raise
        except APIError as e:
            logger.error(f"API error for {model}: {e}")
            raise
    
    def _call_openai(
        self, messages, model, system, temperature, max_tokens, **kwargs
    ) -> dict:
        if system:
            messages = [{"role": "system", "content": system}] + messages
        
        response = self.openai.chat.completions.create(
            model=model,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
            **kwargs
        )
        
        return {
            "content": response.choices[0].message.content,
            "model": model,
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens
        }
    
    def _call_anthropic(
        self, messages, model, system, temperature, max_tokens, **kwargs
    ) -> dict:
        create_kwargs = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            **kwargs
        }
        if system:
            create_kwargs["system"] = system
        
        response = self.anthropic.messages.create(**create_kwargs)
        
        return {
            "content": response.content[0].text,
            "model": model,
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens
        }
    
    def stream(
        self,
        messages: list[dict],
        model: str | None = None,
        system: str | None = None,
        **kwargs
    ) -> Iterator[str]:
        """Stream tokens from OpenAI models."""
        model = model or self.settings.default_model
        
        if system:
            messages = [{"role": "system", "content": system}] + messages
        
        stream = self.openai.chat.completions.create(
            model=model,
            messages=messages,
            stream=True,
            **kwargs
        )
        
        for chunk in stream:
            if chunk.choices[0].delta.content:
                yield chunk.choices[0].delta.content
    
    @property
    def session_cost(self) -> float:
        """Total cost for this session in USD."""
        return round(self._total_cost, 4)
    
    @property
    def session_calls(self) -> int:
        """Total API calls in this session."""
        return self._total_calls

Usage Pattern

# In your application
from my_ai_app.services.llm import LLMClient

client = LLMClient()
# Simple chat
result = client.chat(
    messages=[{"role": "user", "content": "Summarize quantum computing in 3 sentences."}],
    model="gpt-4o-mini",  # Use cheap model for simple tasks
    temperature=0.3
)
print(result["content"])
print(f"Cost: ${result['cost_usd']}")
# With system prompt
result = client.chat(
    messages=[{"role": "user", "content": "Review this code: def foo(): pass"}],
    system="You are an expert Python code reviewer. Be concise and specific.",
    model="gpt-4o",
    temperature=0.1
)
# Streaming
for token in client.stream(
    messages=[{"role": "user", "content": "Write a poem about Python."}]
):
    print(token, end="", flush=True)
print(f"\nSession cost: ${client.session_cost}")

Part 7: Jupyter Best Practices for AI Development

Notebooks are indispensable for exploration and prototyping. But they have pitfalls. Here’s how to use them productively.

# Standard AI Jupyter notebook header
# Put this in your first cell

# %% [markdown]
# # Experiment: Prompt Strategy Comparison
# **Date:** 2026-05-16
# **Author:** Neeraj
# **Goal:** Compare chain-of-thought vs. direct prompting for math problems
# **Status:** In Progress
# %%
# Standard imports
import os
import json
import time
from pathlib import Path
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Project imports
import sys
sys.path.insert(0, str(Path("..").resolve()))
from src.my_ai_app.services.llm import LLMClient
from src.my_ai_app.config import get_settings
# Initialize
client = LLMClient()
settings = get_settings()
print(f"Environment: {settings.environment}")
print(f"Default model: {settings.default_model}")

nbstripout: Never Commit Notebook Outputs

Notebook outputs can contain PII, API responses, and model outputs that shouldn’t be in git. Use nbstripout to automatically strip outputs before committing.

pip install nbstripout
nbstripout --install  # Installs git filter — outputs stripped automatically on commit

🔍 Common Mistakes to Avoid

Mistake 1: Checking .env into Git

The single most dangerous mistake. Add .env to .gitignore before your first commit. Use .env.example as the committed template. Set up gitleaks pre-commit hook as a safety net.

Mistake 2: Using Mutable Default Arguments

A classic Python pitfall that’s common in AI code:

# ❌ Wrong: mutable default argument
def build_messages(user_msg: str, history: list = []) -> list:
    history.append({"role": "user", "content": user_msg})
    return history
# The same [] is shared across all calls!

# ✅ Correct
def build_messages(user_msg: str, history: list | None = None) -> list:
    if history is None:
        history = []
    history.append({"role": "user", "content": user_msg})
    return history

Mistake 3: No Retry Logic on LLM Calls

API calls fail. Rate limits hit. Networks time out. Always wrap LLM calls with retry logic using tenacity or similar. The LLMClient above handles this, but raw API calls without retries are fragile in production.

Mistake 4: Hardcoding Model Names in Business Logic

Model names change (provider deprecations are common). Centralize model names in configuration:

# ❌ Bad: model hardcoded throughout codebase
response = client.chat.completions.create(model="gpt-4o", ...)

# ✅ Good: model from configuration, easily changed
response = client.chat.completions.create(model=settings.default_model, ...)

Mistake 5: Not Logging Token Usage

Without logging token usage per request, you can’t debug cost spikes, attribute costs to features, or optimize expensive prompts. The LLMClient above logs this automatically.

💼 Quick Questions

Q1: How do you manage secrets in an AI application across development, staging, and production?

Answer: Development: .env files loaded by python-dotenv, excluded from git via .gitignore, with .env.example committed as documentation. Pre-commit hooks (gitleaks) prevent accidental commits. Staging: Same pattern with environment-specific values, never in version control. Production: Cloud secrets managers (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) injected as environment variables at runtime. Never .env files in production containers.

Q2: Why should you wrap LLM API calls in a custom client class rather than calling them directly?

Answer: A custom client centralizes: (1) retry logic with exponential backoff — essential since API calls fail intermittently; (2) cost tracking — you need to know what each call costs; (3) structured logging — trace every request with model, tokens, latency, and cost; (4) error categorization — different exceptions (rate limit vs. API error vs. network error) need different handling; (5) model routing — swap models without changing calling code; (6) configuration management — one place to update model names and defaults.

Q3: What is the purpose of a uv.lock or requirements.txt file, and why does it matter for AI projects?

Answer: Lock files record the exact version of every installed package (including transitive dependencies). This ensures that uv sync or pip install -r requirements.txt produces bit-for-bit identical environments everywhere — your laptop, CI, staging, production. For AI projects this is critical because library versions (transformers, langchain, openai SDK) often have breaking changes, and a model’s behavior can change subtly with library updates. Without lockfiles, “works on my machine” is a constant source of friction.

Q4: What is pyenv and why use it instead of system Python?

Answer: pyenv manages multiple Python versions on a single machine. System Python is often outdated, shared across all projects, and can’t be easily upgraded without risk to the OS. With pyenv: each project can pin its own Python version (via .python-version); upgrading Python for one project doesn’t affect others; you can test compatibility across versions. This is essential for AI projects because many frameworks (PyTorch, JAX, etc.) have specific Python version requirements.

Q5: How would you structure an AI project to make it testable?

Answer: Key patterns: (1) Separate configuration from code — use Pydantic Settings with environment variables; (2) Dependency injection — pass LLM clients as parameters rather than importing globals; (3) Abstract LLM calls behind interfaces — mock the interface in tests, not the actual API; (4) Keep prompts out of business logic — store in separate files for easier testing and iteration; (5) Write integration tests against real APIs but with small, cheap requests; (6) Use recorded API responses (VCR cassettes) for fast, cost-free unit testing.

🏭 Production Considerations

Container-Friendly Configuration: In containerized production environments, configuration comes from environment variables injected at runtime — not from files. Your Pydantic Settings class handles this seamlessly: it reads from both .env files (development) and actual environment variables (production) with the same code.

Secret Rotation: API keys should be rotatable without downtime. Using a secrets manager (not environment variables baked into container images) allows you to rotate keys by updating the secret and restarting pods — without rebuilding containers.

PYTHONPATH Management: In Python projects, import paths can be a source of subtle bugs. Set PYTHONPATH explicitly in your VS Code settings, Makefile, and Docker configuration to ensure consistent behavior.

⚡ Performance & Scalability Insights

uv over pip in CI: CI pipelines that install Python dependencies are often bottlenecked by pip’s slow resolution. Switching from pip install -r requirements.txt to uv sync typically reduces CI install time from 2-5 minutes to 10-30 seconds. At scale (hundreds of CI runs per day), this compounds.

Lazy Imports for Fast Startup: AI libraries (torch, transformers) are slow to import. In production APIs, import heavy libraries lazily (inside functions) or in background threads to keep startup time fast and health checks responsive.

🔑 Key Takeaways

Your environment is infrastructure. Treat it with the same care as production infrastructure — version everything, document it, automate it.
Secrets management is non-negotiable. .gitignore, .env.example, and pre-commit hooks form the minimum defense. For production, use a secrets manager.
Centralize LLM calls in a custom client. Retries, logging, cost tracking, and error handling belong in one place — not scattered throughout your application.
Use pyproject.toml + uv.lock for reproducibility. The combination of pyenv (Python version), uv (package management), and lock files produces environments that are identical everywhere.
Structure projects for the long term, not just the prototype. The patterns here scale from a solo prototype to a 10-engineer team. Apply them from Day 1.

📚 Further Reading & Resources

uv Documentation — The modern Python package manager
Pydantic Settings Documentation — Configuration management
pre-commit Documentation — Git hook framework
The Twelve-Factor App — Methodology for building robust applications (especially Factor III: Config)
tenacity Documentation — Retry library for Python