Evaluating Fine-Tuned Models

Evaluating Fine-Tuned Models

Did it work? Learn how to interpret Loss Curves and perform human evaluation to verify your fine-tuned model is better than the base model.

Evaluating Fine-Tuned Models

The training finished. Is the model smart now?

1. The Loss Curve

AI Studio shows a graph called "Loss" over time.

  • Loss: How "wrong" the model was on the training data.
  • Ideal Shape: It should go down and flatten out.
    • Did not go down: Model learned nothing (Check dataset quality or Learning Rate).
    • Went down, then up: Overfitting. (Too many epochs).

2. Qualitative Evaluation (The Eye Test)

Numbers don't tell the whole story. You need to test it.

  • Set A: Create a list of 10 prompts that were not in the training set.
  • Side-by-Side: Run these prompts on the Base Model (flash) and your Tuned Model.
  • Check: Does the Tuned Model follow the style constraints better?
  • Regression Check: Did the Tuned Model become "dumb" at basic tasks? (e.g., Catastrophic Forgetting). Ask it basic math or logic questions to ensure it didn't lose its general intelligence.

3. Deployment

If the eval passes:

  1. Deploy to your app.
  2. Monitor user feedback.
  3. If successful, delete the old tuned model to keep your registry clean.

Summary

  • Loss curves show if it learned.
  • Human review shows what it learned.
  • Always test on unseen data.

Module 5 Complete! You now have custom models. In Module 6, we get technical with Integrating Gemini Models via SDKs.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn