Python Essentials for LLM Engineering

Python is the "lingua franca" of the AI world. While you might already know Python, LLM Engineering requires a specific set of skills that go beyond basic scripts. When you are dealing with probabilistic outputs from an LLM, your code must be exceptionally rigid.

In this lesson, we will focus on the Python tools that turn a "fragile script" into a "production-ready AI service."

1. Type Hinting and Rigorous Validation

Because LLMs return text (which could be anything), you cannot trust the input to your functions. You must use Type Hints and Pydantic.

Why Pydantic?

Pydantic is the most important library for an LLM Engineer. It allows you to define "Schemas" for your data. If the LLM returns a field as a string when it should be a number, Pydantic catches it immediately.

from pydantic import BaseModel, Field
from typing import List, Optional

class AgentAction(BaseModel):
    tool_name: str = Field(description="The name of the tool to call")
    arguments: dict = Field(default_factory=dict)
    rationale: str = Field(description="Why the agent chose this action")
    priority: int = Field(ge=1, le=5) # Must be between 1 and 5

# Validation Example
data = {"tool_name": "web_search", "arguments": {"q": "AI news"}, "rationale": "Need info", "priority": 3}
action = AgentAction(**data)
print(action.tool_name)

2. Environment Management (Dependency Hell Avoidance)

AI libraries (like LangChain or PyTorch) are massive and often have conflicting dependencies. You cannot install these globally.

Professional Choices:

poetry: The industry standard for dependency management and packaging.
venv: The built-in light-weight option.
conda: Better for low-level C++ dependencies typically found in local model training.

LLM Engineer Rule: Always use a .env file to store your API keys. Never hardcode them.

import os
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

3. Decorators for Logging and Observability

In LLM Engineering, you need to know exactly what was sent to the model and what came back. Writing print() statements everywhere is messy. Instead, use Decorators.

import functools
import time

def trace_llm_call(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"--- [START] LLM Call: {func.__name__} ---")
        start_time = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start_time
        print(f"--- [END] Duration: {duration:.2f}s ---")
        return result
    return wrapper

@trace_llm_call
def call_claude(prompt):
    # Imagine model call here
    return "Response from Claude"

call_claude("Analyze this data")

4. Modern Python Features You Must Know

List and Dict Comprehensions

Essential for cleaning and formatting data for RAG pipelines.

# Cleaning raw strings from a PDF scraper
raw_lines = ["  \n", "Title", "Introduction  ", "  "]
clean_lines = [line.strip() for line in raw_lines if line.strip()]

Context Managers (`with` blocks)

Used for managing file handles (PDFs/TXTs) and database connections.

with open("knowledge_base.txt", "r") as f:
    text_content = f.read()

The LLM Engineer's "Mental" Syntax

When writing Python for AI, you should think in Streams. LLMs don't just return data; they stream it token-by-token. Your Python code must be ready to handle Generators and Iterators.

def stream_model_response():
    tokens = ["This", " is", " a", " stream."]
    for token in tokens:
        yield token

# Consuming the stream
for t in stream_model_response():
    print(t, end="", flush=True)

Summary

To be a professional LLM Engineer, your Python must be:

Typed: Use hints so IDEs can catch errors.
Validated: Use Pydantic schemas for LLM outputs.
Isolated: Use virtual environments for every project.
Observable: Wrap your logic in logging and tracing.

In the next lesson, we will look at Async Programming, the secret to building high-performance AI applications that don't make the user wait forever.

Exercise: Schema Design

You are building an agent that extracts "Meeting Minutes" from a transcript.

Define a Pydantic class MeetingMinutes.
Include fields for date (string), attendees (list of strings), and action_items (list of dictionaries with task and owner).
Add a validator or field description that ensures the date is in YYYY-MM-DD format.

This exercise prepares you for the "Structured Output" section in Module 7.