Module 7 Lesson 5: Logistic Regression: Classification
·AI & Machine Learning

Module 7 Lesson 5: Logistic Regression: Classification

Binary decisions made simple. Learn how to use Logistic Regression to categorize data into 'Yes or No' classes like Spam vs. No Spam or Pass vs. Fail.

Module 7 Lesson 5: Logistic Regression: Classification

Despite the name having "Regression" in it, Logistic Regression is actually used for Classification. It doesn't predict a continuous number (45.2); it predicts a Probability that something belongs to a specific category (e.g., There is a 95% chance this email is Spam).

Lesson Overview

In this lesson, we will cover:

  • What is Classification?: Categorizing data into classes.
  • The Sigmoid Function: Turning numbers into probabilities.
  • Implementation: Building a model with LogisticRegression().
  • Binary vs. Multi-class: Yes/No vs. Red/Blue/Green.

1. Regression vs. Classification

  • Linear Regression: Predicting How Much? (House Price, Temperature).
  • Logistic Regression: Predicting Which One? (Spam/No Spam, Pass/Fail, Malignant/Benign).

2. Coding the Model

Let's build a model that predicts whether a student will Pass or Fail an exam based on the hours they studied.

import numpy as np
from sklearn.linear_model import LogisticRegression

# 1. Prepare Data (Hours Studied)
X = np.array([[1], [2], [3], [5], [6], [7], [8]]) 
# 0 = Fail, 1 = Pass
y = np.array([0, 0, 0, 1, 1, 1, 1]) 

# 2. Instantiate
model = LogisticRegression()

# 3. Fit
model.fit(X, y)

# 4. Predict for a student who studied 4 hours
new_student = np.array([[4]])
prediction = model.predict(new_student)
probability = model.predict_proba(new_student)

print(f"Prediction (0=Fail, 1=Pass): {prediction[0]}")
print(f"Probability of Passing: {probability[0][1] * 100:.2f}%")

3. The Math Magic: The Sigmoid Curve

Linear Regression draws a straight line. Logistic Regression draws an "S" shaped curve (the Sigmoid).

  • Values at the top of the S are pushed toward 1 (True).
  • Values at the bottom of the S are pushed toward 0 (False).

Practice Exercise: The Email Filter

  1. Imagine a dataset where X is the number of times the word "Win" appears in an email.
  2. y is 1 for Spam and 0 for Not Spam.
  3. Design a small dataset (5-10 rows) that shows a clear trend (more "wins" = more likely spam).
  4. Fit a LogisticRegression model.
  5. Predict the status of an email that has "Win" appearing 15 times!

Quick Knowledge Check

  1. Is Logistic Regression used for predicting numbers or categories?
  2. What is the name of the "S" shaped curve used in this model?
  3. What does predict_proba() return?
  4. Why wouldn't you use Linear Regression (a straight line) for classification? (Hint: A straight line could predict a value of -5 or 2, which makes no sense for a category!).

Key Takeaways

  • Logistic Regression is the foundational algorithm for binary classification.
  • It predicts probabilities before assigning a final class.
  • It is used in medical diagnosis, credit scoring, and spam detection.
  • The Scikit-Learn pattern remains the same: Import -> Instantiate -> Fit -> Predict.

What’s Next?

Logistic Regression is great for straight-forward decisions. But what if the rules are more complex (e.g., "If it's sunny AND the temperature is > 20 AND it's a weekend...")? In Lesson 6, we’ll learn about Decision Trees!

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn