Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)
Is your AI actually good? Learn how to look beyond 'Accuracy' and understand 'Precision' and 'Recall' to ensure your model isn't missing critical patterns.
Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)
Imagine you build a model to detect a rare disease that only affects 1 in 1,000 people. If your model simply says "Everyone is Healthy," it will be 99.9% accurate, but it is also completely useless! To build real AI, we must look beyond accuracy and learn the three Musketeers of evaluation: Accuracy, Precision, and Recall.
Lesson Overview
In this lesson, we will cover:
- The Confusion Matrix: The map of right and wrong.
- Accuracy: The broad overview.
- Precision: How many "Yes" guesses were actually "Yes"?
- Recall: How many "Yes" items did we actually find?
- F1-Score: The balance between the two.
1. The Confusion Matrix
In classification, there are four possible outcomes:
- True Positive (TP): You predicted "Cancer," and they have "Cancer." (Good!)
- True Negative (TN): You predicted "Healthy," and they are "Healthy." (Good!)
- False Positive (FP): You predicted "Cancer," but they are "Healthy." (Bad - False Alarm)
- False Negative (FN): You predicted "Healthy," but they have "Cancer." (CRITICAL - Missed it)
2. Accuracy vs. The Rest
- Accuracy: (Correct Guesses) / (Total Guesses).
- Precision: TP / (TP + FP). Use this when False Alarms are expensive (e.g., detecting if someone is guilty of a crime).
- Recall: TP / (TP + FN). Use this when Missing something is expensive (e.g., detecting a disease or a fire).
3. Implementation in Scikit-Learn
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Example: Actual vs. Predicted
y_true = [0, 1, 1, 0, 1, 1]
y_pred = [0, 1, 0, 0, 1, 1]
print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")
print(f"Precision: {precision_score(y_true, y_pred):.2f}")
print(f"Recall: {recall_score(y_true, y_pred):.2f}")
4. The F1-Score (The Compromise)
The F1-Score is a single number that balances Precision and Recall. If your F1-Score is high, it means your model is both precise and sensitive.
Practice Exercise: The Fire Alarm Evaluation
- Imagine a fire alarm system.
- Data: Out of 100 days, there were 5 fires.
- The Alarm went off 10 times. On 4 of those times, there was actually a fire. One fire happened without the alarm going off.
- Calculate:
- TP: How many fires were detected?
- FP: How many false alarms?
- FN: How many fires were missed?
- Which is more important for a fire alarm: Precision or Recall? Why?
Quick Knowledge Check
- Why is "99% Accuracy" sometimes a lie?
- Which metric should you focus on if you are building an email spam filter? (Hint: You don't want to accidentally put a work email in spam!).
- Which metric should you focus on for a life-saving medical test?
- What is a "False Negative"?
Key Takeaways
- Accuracy only works well if your classes are balanced (50/50).
- Precision measures the "quality" of your positive guesses.
- Recall measures how well you "find" the positive items.
- The choice of metric depends entirely on the real-world cost of a mistake.
What’s Next?
We have the models and we know how to measure them. In Lesson 8, we will build a complete Spam Filter Project using a real-world NLP (Natural Language Processing) technique!