Version Control for AI: Beyond Git

In standard software, Git is usually enough. But in AI, you are managing three moving targets that change at different speeds:

The Code (Changes rarely).
The Prompts (Changes weekly).
The Models (Changes monthly as providers release new versions).

If you don't version these correctly, a change in a model version (e.g., gpt-4o-2024-05-13 $\rightarrow$ gpt-4o-2024-08-06) can quietly break your entire application.

1. Prompt Versioning: Git is the Answer

You should treat prompts exactly like code. Never use a dashboard to edit a prompt that is in production.

Best Practice:

Store prompts in a /prompts folder as YAML or Jinja2 files.

Each file should have a version number or a descriptive name (e.g., support_bot_v2.yaml).
Your Python code should load the prompt by its file name.

# prompts/support_v2.yaml
version: 2.1
persona: "A friendly banker"
instructions: "Answer according to the PDF context provided."
max_tokens: 500

2. Model Versioning: Pin Your IDs

Standard providers often offer a "Latest" tag (e.g., gpt-4o-latest). NEVER use this in production.

Why pin your IDs?

Providers update these "latest" tags periodically. A model that was 95% accurate yesterday might become 90% accurate today because the provider changed the weights to save money on inference.

Correct Way: model_id = "claude-3-5-sonnet-20240620"

This ensures that even if Anthropic releases a better model tomorrow, your production code remains stable until you have time to test the upgrade.

3. Data and Index Versioning

In a RAG system, your Vector Index is also a versioned asset. If you decide to change your "Chunking Layer" from 500 characters to 1,000 characters, you must re-index your entire database.

You should maintain two indices during a migration:

index_v1 (Current production).
index_v2 (New chunking strategy). Only after you verify v2 is better do you swap the service pointer to the new index.

4. The "Model Registry" Strategy

For enterprise systems, we use a central database (The Model Registry) that maps "Friendly Names" to "Versions."

Application Name	Model Version	Prompt Version	Status
Support-Agent	gpt-4o-v1	system_v5	Production
Support-Agent	gpt-4o-v2	system_v6	Staging (Testing)

By changing values in this registry, you can "Roll back" an AI system to yesterday's version in 1 second if something goes wrong.

Summary of Module 9

CI/CD: Automate the verification of prompts using Golden Datasets (9.1).
Evaluation: Use LLM-as-a-Judge to measure qualitative drift (9.2).
Monitoring: Trace agent paths with LangSmith to find service bottlenecks (9.3).
Versioning: Pin your models, version your prompts in Git, and Blue/Green your vector indices (9.4).

You have completed the Engineering arc of the course. In the next module, we move into the vital topics of Security, Safety, and Responsibility, learning how to protect your AI from a hostile world.

Exercise: The Rollback Drill

A new model version was deployed. Users are reporting that the bot is now "rude."

You check the logs and see the Prompt-ID is system_v10 and Model-ID is latest-gpt.
Draft a 2-step plan to "Roll back" the system safely.
How would you prevent the "latest-gpt" incident from happening again?

Answer Logic:

Rollback: Update the config/registry to point back to system_v9 and the previous pinned timestamp (e.g., gpt-4o-2024-05-13).
Prevention: Implement a rule that only pinned model IDs can be merged into the main branch.