Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)

Imagine you build a model to detect a rare disease that only affects 1 in 1,000 people. If your model simply says "Everyone is Healthy," it will be 99.9% accurate, but it is also completely useless! To build real AI, we must look beyond accuracy and learn the three Musketeers of evaluation: Accuracy, Precision, and Recall.

Lesson Overview

In this lesson, we will cover:

The Confusion Matrix: The map of right and wrong.
Accuracy: The broad overview.
Precision: How many "Yes" guesses were actually "Yes"?
Recall: How many "Yes" items did we actually find?
F1-Score: The balance between the two.

1. The Confusion Matrix

In classification, there are four possible outcomes:

True Positive (TP): You predicted "Cancer," and they have "Cancer." (Good!)
True Negative (TN): You predicted "Healthy," and they are "Healthy." (Good!)
False Positive (FP): You predicted "Cancer," but they are "Healthy." (Bad - False Alarm)
False Negative (FN): You predicted "Healthy," but they have "Cancer." (CRITICAL - Missed it)

2. Accuracy vs. The Rest

Accuracy: (Correct Guesses) / (Total Guesses).
Precision: TP / (TP + FP). Use this when False Alarms are expensive (e.g., detecting if someone is guilty of a crime).
Recall: TP / (TP + FN). Use this when Missing something is expensive (e.g., detecting a disease or a fire).

3. Implementation in Scikit-Learn

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Example: Actual vs. Predicted
y_true = [0, 1, 1, 0, 1, 1]
y_pred = [0, 1, 0, 0, 1, 1]

print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")
print(f"Precision: {precision_score(y_true, y_pred):.2f}")
print(f"Recall: {recall_score(y_true, y_pred):.2f}")

4. The F1-Score (The Compromise)

The F1-Score is a single number that balances Precision and Recall. If your F1-Score is high, it means your model is both precise and sensitive.

Practice Exercise: The Fire Alarm Evaluation

Imagine a fire alarm system.
Data: Out of 100 days, there were 5 fires.
The Alarm went off 10 times. On 4 of those times, there was actually a fire. One fire happened without the alarm going off.
Calculate:
- TP: How many fires were detected?
- FP: How many false alarms?
- FN: How many fires were missed?
Which is more important for a fire alarm: Precision or Recall? Why?

Quick Knowledge Check

Why is "99% Accuracy" sometimes a lie?
Which metric should you focus on if you are building an email spam filter? (Hint: You don't want to accidentally put a work email in spam!).
Which metric should you focus on for a life-saving medical test?
What is a "False Negative"?

Key Takeaways

Accuracy only works well if your classes are balanced (50/50).
Precision measures the "quality" of your positive guesses.
Recall measures how well you "find" the positive items.
The choice of metric depends entirely on the real-world cost of a mistake.

What’s Next?

We have the models and we know how to measure them. In Lesson 8, we will build a complete Spam Filter Project using a real-world NLP (Natural Language Processing) technique!