
Evaluating Fine-Tuned Models
Did it work? Learn how to interpret Loss Curves and perform human evaluation to verify your fine-tuned model is better than the base model.
Evaluating Fine-Tuned Models
The training finished. Is the model smart now?
1. The Loss Curve
AI Studio shows a graph called "Loss" over time.
- Loss: How "wrong" the model was on the training data.
- Ideal Shape: It should go down and flatten out.
- Did not go down: Model learned nothing (Check dataset quality or Learning Rate).
- Went down, then up: Overfitting. (Too many epochs).
2. Qualitative Evaluation (The Eye Test)
Numbers don't tell the whole story. You need to test it.
- Set A: Create a list of 10 prompts that were not in the training set.
- Side-by-Side: Run these prompts on the Base Model (
flash) and your Tuned Model. - Check: Does the Tuned Model follow the style constraints better?
- Regression Check: Did the Tuned Model become "dumb" at basic tasks? (e.g., Catastrophic Forgetting). Ask it basic math or logic questions to ensure it didn't lose its general intelligence.
3. Deployment
If the eval passes:
- Deploy to your app.
- Monitor user feedback.
- If successful, delete the old tuned model to keep your registry clean.
Summary
- Loss curves show if it learned.
- Human review shows what it learned.
- Always test on unseen data.
Module 5 Complete! You now have custom models. In Module 6, we get technical with Integrating Gemini Models via SDKs.