Writing Tests for Agent-Generated Code

The biggest fear in AI-assisted coding is "Silent Regressions"—the agent fixes a bug in the Login page but accidentally breaks the Sign-up page. The only solution is Automated Testing. A production-grade coding agent shouldn't just write code; it should be a Test-Driven Agent.

In this lesson, we will learn how to build an agent that validates its own output using Pytest, Jest, or Playwright.

1. The TDD Agent Workflow (Test-Driven Development)

Instead of "Code -> Test," we teach the agent to follow "Test -> Code -> Test."

The Loop:

Goal: "Add a function that calculates the Fibonacci sequence."
Step 1: Agent writes test_fibonacci.py with 3 test cases (0, 1, 5).
Step 2: Agent runs pytest. Expected Failure.
Step 3: Agent writes the actual implementation in logic.py.
Step 4: Agent runs pytest. Success.
Step 5: Only after the tests pass does the agent "Complete" the task.

2. Generating "Mocks" for External APIs

If an agent is writing code that talks to a database or a cloud API (like Stripe), it can't run a "Real" test easily.

Master Pattern: Teach the agent to write Mocks.
The agent writes a mock for the database connection so it can test its business logic in isolation without needing a live database.

3. The "Critique" Node for Tests

Writing tests is hard. Sometimes an agent writes a "Fake" test that passes even if the code is wrong (e.g., assert True == True).

Implementation:

Use a Second Agent (The QA Auditor) to review the code generated by the first agent:

"Does this test actually exercise the logic?"
"Are there any edge cases (null values, long strings) missing from the tests?"

4. End-to-End (E2E) Testing with Vision

As we saw in Module 14.1, agents can use Vision. This is perfect for UI Testing.

Task: "Make sure the 'Buy' button works."
Agent Action:
1. Click button.
2. Wait 2 seconds.
3. Take screenshot.
4. Vision Check: "Does the screen show a 'Success' message?"

5. Security: The Test-Induced Denial of Service (DoS)

Be careful! An agent might write a test that:

Deletes all rows in the production database.
Sends 1,000 "Test" emails to real customers.
Rule: Agents must ONLY run tests in an Empty Sandbox Environment with fake data.

6. Implementation Strategy: The Test Runner Tool

@tool
def run_tests(test_folder="./tests"):
    """
    Runs all tests in the specified folder and returns a detailed report 
    including coverage and stack traces for any failures.
    """
    result = subprocess.run(["pytest", test_folder], capture_output=True, text=True)
    return {
        "passed": result.returncode == 0,
        "stdout": result.stdout,
        "stderr": result.stderr
    }

Summary and Mental Model

Think of Testing like A Safety Harness.

Without it, the agent is free-climbing a mountain.
With it, the agent can fall (write a bug), but the harness catches them (Test fails), allowing them to climb back up (Fix the code) without dying (Breaking production).

A coding agent without tests is just a liability.

Exercise: Test Architecture

Edge Cases: You are building an agent to handle Refund Calculations.
- Write a list of 3 "Edge cases" the agent must write tests for (e.g., Negative refund? Refund > Original Price?).
Mocking: Why is it important to "Mock" the clock (Time) when testing a trial period feature?
- (Hint: How do you test what happens in 30 days now?)
Refinement: If the run_tests tool returns a 50-line error trace, how should the agent use that trace to find the bug?
- (Hint: Look for the File and Line Number in the trace). Ready for the final modules? Next: Scaling to Millions of Users.

The Self-Verifying Agent: Writing Tests for Code