Capstone: The Final Efficiency Pass

Capstone: The Final Efficiency Pass

Fine-tune your researcher for maximum ROI. Master the final tweaks that shave off the remaining pennies to hit your $0.10 goal.

Capstone: The Final Efficiency Pass

We have our Router (Module 20.2) and our Memory (Module 20.3). Our researcher is functional. But in AI engineering, "Functional" is the floor. "Optimized" is the ceiling.

In this final technical step, we will apply the "Squeeze" to hit our $0.10 per project target.

We will learn how to implement Instruction Minification, Strict Pydantic Output, and the "Single-Turn Synthesis" pattern.


1. Instruction Minification (Module 4.1)

Go to your System Prompt. Remove all "Polite" fillers.

  • Before: "You are an assistant. Please be as concise as possible when you are writing the report for the user." (25 tokens)
  • After: ROLE: Researcher. RULE: Max-Brevity. (5 tokens)

2. Using the 'Batch' Synthesis Pattern (Module 15.4)

Instead of the expert model (Tier 3) participating in the "Back and forth" of research, we only call it Once.

  1. Step 1-5 (Tier 1): Search, Extract, and Save facts to Memory.
  2. Step 6 (Tier 1): Retrieve all relevant facts into one "Context Block."
  3. Step 7 (Tier 3): One single call to write the report using the Block.

Savings: You only pay "Expert Prices" for the final output. The "Thinking" was done at "Discount" prices.


3. Implementation: The Final Synthesis (Python)

Python Code: Consolidation logic

def final_synthesis(topic, memory_obj, router_obj):
    # 1. Get all facts found during the research
    all_facts = memory_obj.search(topic, n_results=10) # Get top 10 facts
    
    # 2. Build a high-density 'Data Block'
    context_block = "\n".join([f"- {f}" for f in all_facts])
    
    # 3. CALL TIER 3 (THE EXPERT) ONLY ONCE
    final_report = router_obj.dispatch(
        tier=3,
        system_prompt="ROLE: Technical Writer. TASK: Multi-source Synthesis.",
        prompt=f"FACTS:\n{context_block}\n\nTOPIC: {topic}\n\nREPORT:"
    )
    
    return final_report

4. The "Stop Sequence" Safety (Module 15.2)

In your synthesis call, use a stop sequence like "REFERENCES:". Why? Because once the report is finished, models often add "Here is why I wrote it..." fluff. By setting a stop sequence after the main content, you save those final 50-100 tokens.


5. Summary and Key Takeaways

  1. Hierarchy of Costs: Do the heavy lifting in Tier 1; save Tier 3 for the "Polish."
  2. Instruction Lean: Prune your system prompts to the symbolic core.
  3. Single-Expert Call: Never involve an expensive model in a task that requires multiple turns to discover facts.
  4. Data Density: Use bullet points and headers in your "Data Block" to help the Tier 3 model reason efficiently.

Exercise: The Final Budget Audit

  1. Run your completed agent.
  2. Check your router.stats["cost"].
  3. Did you hit your $0.10 goal?
    • If YES: You are a Token Efficiency Master.
    • If NO: Audit your "Lineage" (Module 17.2). Which step was the most expensive?
  4. The Final Squeeze: If you are at $0.12, can you replace one Tier 3 call with a Tier 1 call?

Congratulations! Your technical build is complete. Now, let’s wrap up the course.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn