Module 7 Lesson 4: Linear Regression: Predicting Numbers
Your first predictive model. Learn how to use Linear Regression to find the 'Line of Best Fit' and predict numerical values like prices and temperatures.
Module 7 Lesson 4: Linear Regression: Predicting Numbers
In the world of ML, when we want to predict a Number (like height, temperature, or price), we use a technique called Regression. The simplest and most popular version is Linear Regression. It works by finding the "Line of Best Fit" through your data points.
Lesson Overview
In this lesson, we will cover:
- What is Regression?: Continuous vs. Discrete predictions.
- The Line of Best Fit: How the "learning" works conceptually.
- Implementation: Building a model with
LinearRegression(). - Making Predictions: Passing new data to your model.
1. What is Regression?
If you're trying to figure out if an email is "Spam" or "Not Spam," that's a category. If you're trying to figure out exactly how many dollars a house will sell for, that's a Continuous Value. Any time you're predicting a scale (0 to infinity), you're doing Regression.
2. Coding the Model (The 4-Step Pattern)
We'll build a model that predicts house prices based on square footage.
import numpy as np
from sklearn.linear_model import LinearRegression
# 1. Prepare Data (X must be 2D, y must be 1D)
X = np.array([[1000], [1500], [2000], [2500], [3000]]) # Sq Footage
y = np.array([200000, 305000, 400000, 510000, 600000]) # Price
# 2. Instantiate (Create the object)
model = LinearRegression()
# 3. Fit (The training phase)
model.fit(X, y)
# 4. Predict
new_house = np.array([[1800]])
predicted_price = model.predict(new_house)
print(f"Predicted price for an 1800 sq ft house: ${predicted_price[0]:,.2f}")
3. How Does It Work?
Internally, the model calculates a mathematical formula: Price = (Weight * SquareFootage) + Intercept.
- Weight: How much each extra square foot adds to the price.
- Intercept: The starting price even if the house was 0 sq ft.
The "Fit" process is the computer adjusting the Weight and Intercept until the line is as close as possible to all your data points.
Practice Exercise: The Temperature Predictor
- Create an
Xarray of 10 days (1 to 10). - Create a
yarray of corresponding temperatures (e.g., matching a summer trend). - Build and Fit a
LinearRegressionmodel. - Predict the temperature for Day 11 and Day 15.
- Does the prediction keep going up, or does it stay the same?
Quick Knowledge Check
- What is the goal of Linear Regression?
- What is the "Line of Best Fit"?
- In the code example, why is
Xdefined as[[1000], [1500]...](with double brackets)? - Can Linear Regression be used to predict if a user will click an ad? (Hint: No, that's a category!).
Key Takeaways
- Regression is for predicting continuous numerical values.
- The
LinearRegressionmodel finds the best straight line through your data. - The Scikit-Learn pattern makes implementation fast and simple.
- Once trained, the model can predict outcomes for data it has never seen before.
What’s Next?
Not everything is a number. Sometimes we want to predict a Category—like "Will it rain?" or "Is this transaction a fraud?" In Lesson 5, we’ll learn about Logistic Regression!