The Self-Verifying Agent: Writing Tests for Code

The Self-Verifying Agent: Writing Tests for Code

Ensure quality through automation. Learn how to build agents that follow Test-Driven Development (TDD) principles, writing their own unit tests before submitting features.

Writing Tests for Agent-Generated Code

The biggest fear in AI-assisted coding is "Silent Regressions"—the agent fixes a bug in the Login page but accidentally breaks the Sign-up page. The only solution is Automated Testing. A production-grade coding agent shouldn't just write code; it should be a Test-Driven Agent.

In this lesson, we will learn how to build an agent that validates its own output using Pytest, Jest, or Playwright.


1. The TDD Agent Workflow (Test-Driven Development)

Instead of "Code -> Test," we teach the agent to follow "Test -> Code -> Test."

The Loop:

  1. Goal: "Add a function that calculates the Fibonacci sequence."
  2. Step 1: Agent writes test_fibonacci.py with 3 test cases (0, 1, 5).
  3. Step 2: Agent runs pytest. Expected Failure.
  4. Step 3: Agent writes the actual implementation in logic.py.
  5. Step 4: Agent runs pytest. Success.
  6. Step 5: Only after the tests pass does the agent "Complete" the task.

2. Generating "Mocks" for External APIs

If an agent is writing code that talks to a database or a cloud API (like Stripe), it can't run a "Real" test easily.

  • Master Pattern: Teach the agent to write Mocks.
  • The agent writes a mock for the database connection so it can test its business logic in isolation without needing a live database.

3. The "Critique" Node for Tests

Writing tests is hard. Sometimes an agent writes a "Fake" test that passes even if the code is wrong (e.g., assert True == True).

Implementation:

Use a Second Agent (The QA Auditor) to review the code generated by the first agent:

  • "Does this test actually exercise the logic?"
  • "Are there any edge cases (null values, long strings) missing from the tests?"

4. End-to-End (E2E) Testing with Vision

As we saw in Module 14.1, agents can use Vision. This is perfect for UI Testing.

  • Task: "Make sure the 'Buy' button works."
  • Agent Action:
    1. Click button.
    2. Wait 2 seconds.
    3. Take screenshot.
    4. Vision Check: "Does the screen show a 'Success' message?"

5. Security: The Test-Induced Denial of Service (DoS)

Be careful! An agent might write a test that:

  • Deletes all rows in the production database.
  • Sends 1,000 "Test" emails to real customers.
  • Rule: Agents must ONLY run tests in an Empty Sandbox Environment with fake data.

6. Implementation Strategy: The Test Runner Tool

@tool
def run_tests(test_folder="./tests"):
    """
    Runs all tests in the specified folder and returns a detailed report 
    including coverage and stack traces for any failures.
    """
    result = subprocess.run(["pytest", test_folder], capture_output=True, text=True)
    return {
        "passed": result.returncode == 0,
        "stdout": result.stdout,
        "stderr": result.stderr
    }

Summary and Mental Model

Think of Testing like A Safety Harness.

  • Without it, the agent is free-climbing a mountain.
  • With it, the agent can fall (write a bug), but the harness catches them (Test fails), allowing them to climb back up (Fix the code) without dying (Breaking production).

A coding agent without tests is just a liability.


Exercise: Test Architecture

  1. Edge Cases: You are building an agent to handle Refund Calculations.
    • Write a list of 3 "Edge cases" the agent must write tests for (e.g., Negative refund? Refund > Original Price?).
  2. Mocking: Why is it important to "Mock" the clock (Time) when testing a trial period feature?
    • (Hint: How do you test what happens in 30 days now?)
  3. Refinement: If the run_tests tool returns a 50-line error trace, how should the agent use that trace to find the bug?
    • (Hint: Look for the File and Line Number in the trace). Ready for the final modules? Next: Scaling to Millions of Users.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn