Scalable Python for AI: Architecting Production Systems

Building an AI prototype is easy; maintaining it is hard. Most AI projects fail because they become a "spaghetti" mess of hardcoded prompts and unorganized API calls. As an LLM Engineer, your goal is to build systems that are Modular, Testable, and Scalable.

In this lesson, we will apply professional Software Engineering principles to the world of AI.

1. Separation of Concerns (The AI Layers)

You should never have a single file that contains your prompt, your API keys, and your business logic. You must separate your application into distinct layers.

graph TD
    A[Controllers/API: FastAPI] --> B[Service Layer: Orchestration]
    B --> C[Provider Layer: Models & APIs]
    B --> D[Tool Layer: External Search/DBs]
    B --> E[Template Layer: Prompt Management]

The Folders of a Professional AI Project:

/prompts: Jinja2 or YAML files containing your prompt templates.
/models: Implementation of model providers (Claude vs. GPT).
/tools: Clean functions that can be called by agents.
/services: The core "Agent" or "Chain" logic.

2. Using the Factory Pattern for Models

Models change. Today you use GPT-4o, tomorrow you might switch to Claude 3.5. You shouldn't have to change 50 files to make that switch. Use a Factory Pattern.

from abc import ABC, abstractmethod

# Abstract Base Class forces consistency
class BaseLLM(ABC):
    @abstractmethod
    def generate(self, prompt: str) -> str:
        pass

class OpenAIProvider(BaseLLM):
    def generate(self, prompt: str):
        return "GPT-4 result"

class AnthropicProvider(BaseLLM):
    def generate(self, prompt: str):
        return "Claude result"

# The Factory decides which one to use based on config
def llm_factory(provider_name: str) -> BaseLLM:
    if provider_name == "openai":
        return OpenAIProvider()
    return AnthropicProvider()

# Usage: Switch provider in one line!
ai = llm_factory("openai")
print(ai.generate("Hello"))

3. Configuration Management

Hardcoding settings like max_tokens, temperature, or model_id inside your logic is a major anti-pattern. Use a configuration file (like config.yaml or a .env file).

Professional Settings Management with Pydantic:

from pydantic_settings import BaseSettings

class AppSettings(BaseSettings):
    openai_api_key: str
    default_model: str = "gpt-4o"
    max_retries: int = 3
    temperature: float = 0.7

    class Config:
        env_file = ".env"

# Initialize once
settings = AppSettings()
print(f"Using model: {settings.default_model}")

4. Retries and Fault Tolerance

Model APIs fail. They time out, hit rate limits, or return "Internal Server Error." Your code must be Resilient.

Use the tenacity library to add professional retry logic to your LLM calls.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def call_expensive_ai_api():
    print("Attempting AI call...")
    # This will automatically retry with increasing wait times if it fails
    raise Exception("Rate limit hit!")

try:
    call_expensive_ai_api()
except:
    print("Project failed gracefully after 3 retries.")

5. Logging for AI (Traceability)

In AI, you don't just log "errors"; you log intent. You need to record the PromptID, the UserID, the Model, and the Cost. This data is gold for future fine-tuning.

What to log per request:

request_id: A unique UUID to track the lifecycle.
input_tokens and output_tokens: For cost accounting.
latency: How long did the model take?
raw_prompt: (Optional, be careful with PII) helps in debugging hallucinations.

Summary

Be Modular: Use the Factory pattern to swap models easily.
Be Config-Driven: No hardcoded numbers or keys.
Be Resilient: Use tenacity for retries.
Be Observable: Log everything needed to recreate the AI's "thought process."

This concludes Module 3: Python for LLM Engineering. You now have the coding discipline needed to build enterprise AI.

In the next module, we move into the Art of Prompt Engineering, where you will learn to communicate your intent to the models you've learned to architect.

Exercise: The Architect

You want to build an app that defaults to a cheap model (Mistral) but automatically switches to an expensive model (GPT-4o) if the "Complexity Score" of the user's question is higher than 8.

Draft the Python structure for:

A Router function.
Two Provider classes (using the Factory pattern above).
A settings object that holds the keys for both.

This logic is the foundation of "Cost-Optimized AI Routing."