Module 6 Lesson 4: Pandas Basics: DataFrames and Series
·Data Science

Module 6 Lesson 4: Pandas Basics: DataFrames and Series

Welcome to the DataFrame. Learn about the two core building blocks of Pandas and how to navigate tables like a data professional.

Module 6 Lesson 4: Pandas Basics: DataFrames and Series

If NumPy is the "engine" of data science, Pandas is the "user interface." It allows you to work with data in a way that feels like a powerful, programmable version of Excel. In this lesson, we’ll meet the two most important objects in Pandas: Series and DataFrames.

Lesson Overview

In this lesson, we will cover:

  • What is Pandas?: Why it’s the king of data analysis.
  • Series: One-dimensional labeled arrays.
  • DataFrames: Two-dimensional tables.
  • Accessing Data: Head, Tail, and Columns.

1. What is a Series?

A Series is like a column in a spreadsheet. It’s a list of values, but every value has a label called an "index."

import pandas as pd # Standard way to import pandas

grades = pd.Series([85, 90, 78], index=["Alex", "Sara", "Tom"])
print(grades)
print(grades["Sara"]) # Output: 90

2. What is a DataFrame?

A DataFrame is a full table. It’s essentially a collection of Series objects that share the same index.

data = {
    "Name": ["Alex", "Sara", "Tom"],
    "Age": [15, 16, 15],
    "Score": [85, 90, 78]
}

df = pd.DataFrame(data)
print(df)

3. Navigating Large Data

When you have 10,000 rows, you don't want to print them all. Pandas provides tools to "peek" at your data.

# See the first 5 rows
print(df.head())

# See the last 5 rows
print(df.tail())

# Get summary statistics for all numerical columns
print(df.describe())

4. Selecting Data

You can select a single column from a DataFrame just like a key in a dictionary.

# Get just the 'Name' column (this returns a Series!)
names = df["Name"]

# Get multiple columns
subset = df[["Name", "Score"]]

Practice Exercise: The Employee Table

  1. Create a dictionary with three keys: Employee, Department, and Salary.
  2. Add at least 5 rows of data.
  3. Convert the dictionary into a Pandas DataFrame.
  4. Print the first 3 rows using head().
  5. Print the average salary using df["Salary"].mean().

Quick Knowledge Check

  1. What is the difference between a Series and a DataFrame?
  2. How do you import pandas according to industry convention?
  3. What does the describe() method do?
  4. How do you select only the "Price" column from a DataFrame named df?

Key Takeaways

  • Pandas is built for tabular data (rows and columns).
  • A Series is a labeled column; a DataFrame is a table.
  • DataFrames allow for complex selection and quick statistical summaries.
  • Pandas is built on top of NumPy, meaning it’s incredibly fast.

What’s Next?

Manually creating a dictionary for every table is exhausting. In Lesson 5, we’ll learn how to Load Data directly from CSV and Excel files into your Python code with just one line!

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn