Module 6 Lesson 5: Loading Data with Pandas
·Data Science

Module 6 Lesson 5: Loading Data with Pandas

Unlock your data. Learn how to import real-world information from CSV, Excel, and JSON files into your Python scripts for analysis.

Module 6 Lesson 5: Loading Data with Pandas

In the last lesson, we hand-typed our data into dictionaries. In the real world, you’ll be dealing with thousands of rows stored in external files. Pandas makes "ingesting" this data incredibly easy with its read_* functions.

Lesson Overview

In this lesson, we will cover:

  • The CSV Loader: pd.read_csv().
  • Excel and JSON Support: Handling other popular formats.
  • Data Exploration: info(), shape, and columns.
  • Handling Large Files: Memory considerations.

1. The CSV Specialist

CSV is the bread-and-butter of data science. Pandas handles it much faster and more intelligently than our Module 5 code.

import pandas as pd

# Load a local file
df = pd.read_csv("world_population.csv")

# You can even load data directly from a URL!
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
gdp_data = pd.read_csv(url)

2. Reading Excel and JSON

Pandas can also handle Microsoft Excel files (.xlsx) and JSON files effortlessly.

# Install 'openpyxl' first if you use Excel files
# df_excel = pd.read_excel("sales_report.xlsx", sheet_name="2024")

df_json = pd.read_json("user_activity.json")

3. Investigating Your New DataFrame

Once you've loaded a file, the first thing you should always do is "inspect" it to see what you're working with.

# 1. See how many rows and columns (Rows, Cols)
print(df.shape)

# 2. See the column names
print(df.columns)

# 3. See the data types and missing values
print(df.info())

4. Why Use Pandas Over Standard Files?

  1. Automatic Headers: Pandas automatically recognizes the first row as the column names.
  2. Type Detection: It correctly guesses that "Price" is a number and "Name" is a string.
  3. Speed: It uses professional-grade C-code under the hood to process files in milliseconds.

Practice Exercise: The Dataset Explorer

  1. Find a small CSV file online (e.g., from Kaggle or any public data source).
  2. Load it into a Pandas DataFrame.
  3. Print the shape of the data.
  4. Print the info() summary.
  5. Use head(10) to see the first 10 rows.
  6. Identify one column that has missing values (look at the "Non-Null Count" in info()).

Quick Knowledge Check

  1. What function is used to load a CSV file in Pandas?
  2. How do you see the total number of rows and columns in a DataFrame?
  3. Which method gives you the data types of every column?
  4. True or False: Pandas can load data from a web URL.

Key Takeaways

  • pd.read_csv() is the most common way to get data into Python.
  • The info() method is the best way to start any data analysis.
  • Pandas handles the "heavy lifting" of file reading and type conversion for you.
  • You can load multiple files and formats into the same script.

What’s Next?

Real-world data is messy. It has missing numbers, misspelled words, and weird symbols. In Lesson 6, we’ll learn the art of Data Cleaning—fixing those mistakes before they ruin our analysis!

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn