Capstone Project: The Automated Data Scientist

Welcome to the Capstone Project. This is not a lesson—it is a challenge. You will build a system that acts like a junior data scientist. It takes a raw, messy dataset and performs the entire pipeline automatically.

1. Project Requirements

Your "Automated Data Scientist" must perform the following steps:

Ingestion: Load any CSV file passed to it.
Cleaning: Automatically identify and fill missing values.
Analysis: Print basic stats and a Correlation Heatmap (Seaborn).
Modeling: Train both a Linear Regression and a Random Forest model.
Comparison: Print which model performed better using the metrics from Module 7.

2. Architecture: The Modular Approach

To build this professional-level project, you should use the Object-Oriented skills from Module 4 and the Error Management from Module 5.

Recommended Structure:

data_handler.py: A class for loading and cleaning.
analysis_engine.py: Functions for math and plotting.
model_builder.py: A class to handle the Scikit-Learn patterns.
main.py: The entry point that ties everything together.

3. High-Level Logic Walkthrough

Step A: The Cleaner

Build a function that looks at every column. If it's a number, fill NaN with the mean. If it's text, fill NaN with "Unknown".

Step B: The Multi-Trainer

Create a loop that takes a list of models:

models = [LinearRegression(), RandomForestRegressor()]
for m in models:
    m.fit(X_train, y_train)
    score = m.score(X_test, y_test)
    print(f"{type(m).__name__} Score: {score}")

4. Final Submission

A complete project should include:

The Source Code: Well-commented Python files.
A README: Explaining how to run the system and what libraries (pip install) are needed.
Sample Output: A screenshot or text file showing the analysis and model comparison results.

Advice for Success

Fail Gracefully: Use try-except blocks. If the CSV is broken, don't crash—tell the user why.
Visualize: A picture is worth a thousand stats. Make sure your "Automated Scientist" saves at least one PNG chart.
Keep it Modular: Don't write everything in one file. Break it down into components.

Course Wrap-up

You have completed the entire "Python from Basics to AI" curriculum. You now have a portfolio-ready project that proves you can:

Write clean, professional Python code.
Handle and clean real-world data at scale.
Build and evaluate predictive AI models.

The future is yours. What will you build next?