Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)
·AI & Machine Learning

Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)

Is your AI actually good? Learn how to look beyond 'Accuracy' and understand 'Precision' and 'Recall' to ensure your model isn't missing critical patterns.

Module 7 Lesson 7: Evaluating Models (Accuracy, Precision, Recall)

Imagine you build a model to detect a rare disease that only affects 1 in 1,000 people. If your model simply says "Everyone is Healthy," it will be 99.9% accurate, but it is also completely useless! To build real AI, we must look beyond accuracy and learn the three Musketeers of evaluation: Accuracy, Precision, and Recall.

Lesson Overview

In this lesson, we will cover:

  • The Confusion Matrix: The map of right and wrong.
  • Accuracy: The broad overview.
  • Precision: How many "Yes" guesses were actually "Yes"?
  • Recall: How many "Yes" items did we actually find?
  • F1-Score: The balance between the two.

1. The Confusion Matrix

In classification, there are four possible outcomes:

  1. True Positive (TP): You predicted "Cancer," and they have "Cancer." (Good!)
  2. True Negative (TN): You predicted "Healthy," and they are "Healthy." (Good!)
  3. False Positive (FP): You predicted "Cancer," but they are "Healthy." (Bad - False Alarm)
  4. False Negative (FN): You predicted "Healthy," but they have "Cancer." (CRITICAL - Missed it)

2. Accuracy vs. The Rest

  • Accuracy: (Correct Guesses) / (Total Guesses).
  • Precision: TP / (TP + FP). Use this when False Alarms are expensive (e.g., detecting if someone is guilty of a crime).
  • Recall: TP / (TP + FN). Use this when Missing something is expensive (e.g., detecting a disease or a fire).

3. Implementation in Scikit-Learn

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Example: Actual vs. Predicted
y_true = [0, 1, 1, 0, 1, 1]
y_pred = [0, 1, 0, 0, 1, 1]

print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")
print(f"Precision: {precision_score(y_true, y_pred):.2f}")
print(f"Recall: {recall_score(y_true, y_pred):.2f}")

4. The F1-Score (The Compromise)

The F1-Score is a single number that balances Precision and Recall. If your F1-Score is high, it means your model is both precise and sensitive.


Practice Exercise: The Fire Alarm Evaluation

  1. Imagine a fire alarm system.
  2. Data: Out of 100 days, there were 5 fires.
  3. The Alarm went off 10 times. On 4 of those times, there was actually a fire. One fire happened without the alarm going off.
  4. Calculate:
    • TP: How many fires were detected?
    • FP: How many false alarms?
    • FN: How many fires were missed?
  5. Which is more important for a fire alarm: Precision or Recall? Why?

Quick Knowledge Check

  1. Why is "99% Accuracy" sometimes a lie?
  2. Which metric should you focus on if you are building an email spam filter? (Hint: You don't want to accidentally put a work email in spam!).
  3. Which metric should you focus on for a life-saving medical test?
  4. What is a "False Negative"?

Key Takeaways

  • Accuracy only works well if your classes are balanced (50/50).
  • Precision measures the "quality" of your positive guesses.
  • Recall measures how well you "find" the positive items.
  • The choice of metric depends entirely on the real-world cost of a mistake.

What’s Next?

We have the models and we know how to measure them. In Lesson 8, we will build a complete Spam Filter Project using a real-world NLP (Natural Language Processing) technique!

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn