Module 6 Lesson 4: Pandas Basics: DataFrames and Series
Welcome to the DataFrame. Learn about the two core building blocks of Pandas and how to navigate tables like a data professional.
Module 6 Lesson 4: Pandas Basics: DataFrames and Series
If NumPy is the "engine" of data science, Pandas is the "user interface." It allows you to work with data in a way that feels like a powerful, programmable version of Excel. In this lesson, we’ll meet the two most important objects in Pandas: Series and DataFrames.
Lesson Overview
In this lesson, we will cover:
- What is Pandas?: Why it’s the king of data analysis.
- Series: One-dimensional labeled arrays.
- DataFrames: Two-dimensional tables.
- Accessing Data: Head, Tail, and Columns.
1. What is a Series?
A Series is like a column in a spreadsheet. It’s a list of values, but every value has a label called an "index."
import pandas as pd # Standard way to import pandas
grades = pd.Series([85, 90, 78], index=["Alex", "Sara", "Tom"])
print(grades)
print(grades["Sara"]) # Output: 90
2. What is a DataFrame?
A DataFrame is a full table. It’s essentially a collection of Series objects that share the same index.
data = {
"Name": ["Alex", "Sara", "Tom"],
"Age": [15, 16, 15],
"Score": [85, 90, 78]
}
df = pd.DataFrame(data)
print(df)
3. Navigating Large Data
When you have 10,000 rows, you don't want to print them all. Pandas provides tools to "peek" at your data.
# See the first 5 rows
print(df.head())
# See the last 5 rows
print(df.tail())
# Get summary statistics for all numerical columns
print(df.describe())
4. Selecting Data
You can select a single column from a DataFrame just like a key in a dictionary.
# Get just the 'Name' column (this returns a Series!)
names = df["Name"]
# Get multiple columns
subset = df[["Name", "Score"]]
Practice Exercise: The Employee Table
- Create a dictionary with three keys:
Employee,Department, andSalary. - Add at least 5 rows of data.
- Convert the dictionary into a Pandas DataFrame.
- Print the first 3 rows using
head(). - Print the average salary using
df["Salary"].mean().
Quick Knowledge Check
- What is the difference between a Series and a DataFrame?
- How do you import pandas according to industry convention?
- What does the
describe()method do? - How do you select only the "Price" column from a DataFrame named
df?
Key Takeaways
- Pandas is built for tabular data (rows and columns).
- A Series is a labeled column; a DataFrame is a table.
- DataFrames allow for complex selection and quick statistical summaries.
- Pandas is built on top of NumPy, meaning it’s incredibly fast.
What’s Next?
Manually creating a dictionary for every table is exhausting. In Lesson 5, we’ll learn how to Load Data directly from CSV and Excel files into your Python code with just one line!