
Version Control for AI: Beyond Git
Learn how to manage the lifecycle of your AI components. Master versioning for prompt templates, model IDs, and vector indices to ensure reproducibility and rollback capability.
Version Control for AI: Beyond Git
In standard software, Git is usually enough. But in AI, you are managing three moving targets that change at different speeds:
- The Code (Changes rarely).
- The Prompts (Changes weekly).
- The Models (Changes monthly as providers release new versions).
If you don't version these correctly, a change in a model version (e.g., gpt-4o-2024-05-13 $\rightarrow$ gpt-4o-2024-08-06) can quietly break your entire application.
1. Prompt Versioning: Git is the Answer
You should treat prompts exactly like code. Never use a dashboard to edit a prompt that is in production.
Best Practice:
Store prompts in a /prompts folder as YAML or Jinja2 files.
- Each file should have a version number or a descriptive name (e.g.,
support_bot_v2.yaml). - Your Python code should load the prompt by its file name.
# prompts/support_v2.yaml
version: 2.1
persona: "A friendly banker"
instructions: "Answer according to the PDF context provided."
max_tokens: 500
2. Model Versioning: Pin Your IDs
Standard providers often offer a "Latest" tag (e.g., gpt-4o-latest). NEVER use this in production.
Why pin your IDs?
Providers update these "latest" tags periodically. A model that was 95% accurate yesterday might become 90% accurate today because the provider changed the weights to save money on inference.
Correct Way:
model_id = "claude-3-5-sonnet-20240620"
This ensures that even if Anthropic releases a better model tomorrow, your production code remains stable until you have time to test the upgrade.
3. Data and Index Versioning
In a RAG system, your Vector Index is also a versioned asset. If you decide to change your "Chunking Layer" from 500 characters to 1,000 characters, you must re-index your entire database.
You should maintain two indices during a migration:
index_v1(Current production).index_v2(New chunking strategy). Only after you verifyv2is better do you swap the service pointer to the new index.
4. The "Model Registry" Strategy
For enterprise systems, we use a central database (The Model Registry) that maps "Friendly Names" to "Versions."
| Application Name | Model Version | Prompt Version | Status |
|---|---|---|---|
| Support-Agent | gpt-4o-v1 | system_v5 | Production |
| Support-Agent | gpt-4o-v2 | system_v6 | Staging (Testing) |
By changing values in this registry, you can "Roll back" an AI system to yesterday's version in 1 second if something goes wrong.
Summary of Module 9
- CI/CD: Automate the verification of prompts using Golden Datasets (9.1).
- Evaluation: Use LLM-as-a-Judge to measure qualitative drift (9.2).
- Monitoring: Trace agent paths with LangSmith to find service bottlenecks (9.3).
- Versioning: Pin your models, version your prompts in Git, and Blue/Green your vector indices (9.4).
You have completed the Engineering arc of the course. In the next module, we move into the vital topics of Security, Safety, and Responsibility, learning how to protect your AI from a hostile world.
Exercise: The Rollback Drill
A new model version was deployed. Users are reporting that the bot is now "rude."
- You check the logs and see the
Prompt-IDissystem_v10andModel-IDislatest-gpt. - Draft a 2-step plan to "Roll back" the system safely.
- How would you prevent the "latest-gpt" incident from happening again?
Answer Logic:
- Rollback: Update the config/registry to point back to
system_v9and the previous pinned timestamp (e.g.,gpt-4o-2024-05-13). - Prevention: Implement a rule that only pinned model IDs can be merged into the
mainbranch.