
The Self-Verifying Agent: Writing Tests for Code
Ensure quality through automation. Learn how to build agents that follow Test-Driven Development (TDD) principles, writing their own unit tests before submitting features.
Writing Tests for Agent-Generated Code
The biggest fear in AI-assisted coding is "Silent Regressions"—the agent fixes a bug in the Login page but accidentally breaks the Sign-up page. The only solution is Automated Testing. A production-grade coding agent shouldn't just write code; it should be a Test-Driven Agent.
In this lesson, we will learn how to build an agent that validates its own output using Pytest, Jest, or Playwright.
1. The TDD Agent Workflow (Test-Driven Development)
Instead of "Code -> Test," we teach the agent to follow "Test -> Code -> Test."
The Loop:
- Goal: "Add a function that calculates the Fibonacci sequence."
- Step 1: Agent writes
test_fibonacci.pywith 3 test cases (0, 1, 5). - Step 2: Agent runs
pytest. Expected Failure. - Step 3: Agent writes the actual implementation in
logic.py. - Step 4: Agent runs
pytest. Success. - Step 5: Only after the tests pass does the agent "Complete" the task.
2. Generating "Mocks" for External APIs
If an agent is writing code that talks to a database or a cloud API (like Stripe), it can't run a "Real" test easily.
- Master Pattern: Teach the agent to write Mocks.
- The agent writes a mock for the database connection so it can test its business logic in isolation without needing a live database.
3. The "Critique" Node for Tests
Writing tests is hard. Sometimes an agent writes a "Fake" test that passes even if the code is wrong (e.g., assert True == True).
Implementation:
Use a Second Agent (The QA Auditor) to review the code generated by the first agent:
- "Does this test actually exercise the logic?"
- "Are there any edge cases (null values, long strings) missing from the tests?"
4. End-to-End (E2E) Testing with Vision
As we saw in Module 14.1, agents can use Vision. This is perfect for UI Testing.
- Task: "Make sure the 'Buy' button works."
- Agent Action:
- Click button.
- Wait 2 seconds.
- Take screenshot.
- Vision Check: "Does the screen show a 'Success' message?"
5. Security: The Test-Induced Denial of Service (DoS)
Be careful! An agent might write a test that:
- Deletes all rows in the production database.
- Sends 1,000 "Test" emails to real customers.
- Rule: Agents must ONLY run tests in an Empty Sandbox Environment with fake data.
6. Implementation Strategy: The Test Runner Tool
@tool
def run_tests(test_folder="./tests"):
"""
Runs all tests in the specified folder and returns a detailed report
including coverage and stack traces for any failures.
"""
result = subprocess.run(["pytest", test_folder], capture_output=True, text=True)
return {
"passed": result.returncode == 0,
"stdout": result.stdout,
"stderr": result.stderr
}
Summary and Mental Model
Think of Testing like A Safety Harness.
- Without it, the agent is free-climbing a mountain.
- With it, the agent can fall (write a bug), but the harness catches them (Test fails), allowing them to climb back up (Fix the code) without dying (Breaking production).
A coding agent without tests is just a liability.
Exercise: Test Architecture
- Edge Cases: You are building an agent to handle Refund Calculations.
- Write a list of 3 "Edge cases" the agent must write tests for (e.g., Negative refund? Refund > Original Price?).
- Mocking: Why is it important to "Mock" the clock (Time) when testing a trial period feature?
- (Hint: How do you test what happens in 30 days now?)
- Refinement: If the
run_teststool returns a 50-line error trace, how should the agent use that trace to find the bug?- (Hint: Look for the
FileandLine Numberin the trace). Ready for the final modules? Next: Scaling to Millions of Users.
- (Hint: Look for the