Module 6 Lesson 3: NumPy Math and Stats
Harness the statistical power of NumPy. Learn how to calculate means, medians, and standard deviations with one line of code across millions of data points.
Module 6 Lesson 3: NumPy Math and Stats
In the last lesson, we learned how to create arrays and perform simple math like addition. In this lesson, we’ll explore NumPy's built-in statistical functions and how to "filter" your data using conditional logic.
Lesson Overview
In this lesson, we will cover:
- Statistical Functions: Mean, Median, Sum, and Std.
- Axe and Dimensions: Calculating stats across rows or columns.
- Boolean Masking: Filtering data without using if-statements.
- Slicing Arrays: Getting exactly the data you need.
1. Statistical Powerhouse
NumPy can calculate statistics across an entire array instantly.
import numpy as np
scores = np.array([85, 90, 78, 92, 88])
print(f"Average: {np.mean(scores)}")
print(f"Highest: {np.max(scores)}")
print(f"Spread (Std Dev): {np.std(scores):.2f}")
2. Working with Axes
When you have a 2D array (a table), you can choose to calculate stats for the whole thing, for each row, or for each column.
axis=0: Columns (Vertical).axis=1: Rows (Horizontal).
grades = np.array([
[80, 85], # Student 1
[90, 95] # Student 2
])
print(np.mean(grades, axis=1)) # Output: [82.5, 92.5] (Averages for each student)
3. Filtering with Boolean Masks
This is the most "magical" part of NumPy. You can filter data using a logical condition.
ages = np.array([12, 25, 18, 40, 30])
# Create a "mask" (True/False list)
adult_mask = ages >= 18
# Use the mask to get just the data you want
adults = ages[adult_mask]
print(adults) # Output: [25 18 40 30]
4. Slicing and Dicing
Slicing works just like Python lists, but you can do it on multiple dimensions.
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4]) # Output: [20 30 40] (items at index 1, 2, 3)
Practice Exercise: The Rainfall Analyzer
- Create a 2D NumPy array representing rainfall for 2 cities over 7 days (a 2x7 matrix).
- Calculate the Total Rainfall for each city (hint: use
axis=1). - Calculate the Max Rainfall recorded across the entire region.
- Use a Boolean Mask to find and print only the days where rainfall was greater than 5.0.
Quick Knowledge Check
- What does
np.std()calculate? - What is the difference between
axis=0andaxis=1in a 2D array? - What is a Boolean Mask?
- How would you find the median value of an array named
data?
Key Takeaways
- NumPy handles complex stats with built-in, optimized functions.
- Axis parameters allow you to control the direction of your calculations.
- Boolean masking is a fast, expressive way to filter data.
- Mastering these tools is essential for modern data analysis.
What’s Next?
NumPy is for numbers. But what if your data has words, dates, and missing values? In Lesson 4, we’ll meet Pandas—the library that brings spreadsheet-style power to Python!