Capstone Project: Building a Production-Grade Corporate Intelligence Agent (CODA)

Welcome to the finish line. You have journeyed through 17 modules, covering the architecture, tools, memory, multimodality, and deployment of Gemini-powered agents. Now, it is time to synthesize this knowledge into a single, production-grade application.

In this Capstone Project, you will build CODA (Corporate Discovery Agent).

1. The Project Vision

CODA is a multi-agent system designed for financial analysts and corporate strategists. Instead of manually reading 10-K filings, searching LinkedIn for employee trends, and watching 2-hour earnings calls, the user simply gives CODA a company name. CODA then orchestrates a team of agents to return a comprehensive 360-degree intelligence briefing.

Core Features:

Autonomous Web Research: Searching for recent news and acquisitions.
Visual Financial Analysis: "Looking" at charts and tables in annual reports.
Cross-Document Reasoning: Connecting info from a PDF to a recent news article.
Persistent Session Memory: Remembering the user's specific areas of interest (e.g., "Focus on ESG metrics").
Multi-Agent Coordination: A Supervisor managing a Researcher and a Strategist.

2. The CODA Architecture

CODA follows the Hierarchical Multi-Agent Pattern.

graph TD
    U[User Interface] --> S[Supervisor Agent]
    
    subgraph "The Intelligence Team"
    S <--> R[Researcher Worker]
    S <--> ST[Strategist Worker]
    end
    
    subgraph "The Infrastructure"
    R --> T1[Google Search Tool]
    R --> T2[Vision Analysis Tool]
    S <--> M[(Redis Memory)]
    S --> L[Audit Log / Safety Gate]
    end
    
    style S fill:#4285F4,color:#fff
    style R fill:#34A853,color:#fff
    style ST fill:#F4B400,color:#fff

3. Project Requirements (Technical Specs)

To pass this capstone, your implementation must include:

[ ] Multi-Agent Loop: At least two distinct agent roles (one managing, one executing).
[ ] Tool Integration: Use of at least 2 custom tools (e.g., Search, Calculation, DB lookup).
[ ] Multimodal Input: The ability to analyze an image (chart/diagram) as part of the research.
[ ] Stateful History: Persistence of the conversation using a database or structured file.
[ ] Safety Check: Implementation of a basic PII redaction or a behavioral "Watcher" gate.

4. Phase 1: The Tools

First, you must build the "Hands" of your agent.

Tool 1: The Web Searcher

Use a mock or a real API (like Google Custom Search) to fetch the top 3 news results for a company.

Tool 2: The Visual Chart Analyst

A function that accepts an image path and asks Gemini Pro: "Extract the key revenue figures from this chart and return them as a JSON object."

5. Phase 2: The Core Logic (The Supervisor)

Identify the Supervisor's System Instruction. It should be the "Conductor" that decides when to research and when to strategize.

Draft System Prompt:

"You are the CODA Supervisor. Your goal is to provide a corporate briefing.

Use 'Researcher' to get initial facts.

If an image is provided, use 'VisualAnalyst'.

Once all data is collected, use 'Strategist' to write the briefing.

Wait for user feedback before concluding."

6. Implementation Boilerplate (The Scaffold)

Use this as your starting point for the final Python application.

import os
import google.generativeai as genai
from typing import List

# 1. API SETUP
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))

# 2. DEFINE TOOLS
def web_search(query: str):
    """Searches for news. Returns a string of findings."""
    # (Implementation here)
    return f"Search results for {query}..."

def vision_analyst(image_path: str):
    """Analyzes a chart image. Returns numerical data."""
    # (Implementation here)
    img = genai.upload_file(image_path)
    return "Visual Findings..."

# 3. DEFINE THE AGENT TEAM
class CODA:
    def __init__(self):
        self.supervisor = genai.GenerativeModel(
            model_name='gemini-1.5-pro',
            tools=[web_search, vision_analyst],
            system_instruction="You are the CODA Supervisor. Manage the research flow."
        )
        self.session = self.supervisor.start_chat(enable_automatic_function_calling=True)

    def run_briefing(self, company_name: str, chart_path: str = None):
        prompt = f"Perform a briefing on {company_name}."
        if chart_path:
            prompt += f" ALSO: Analyze the chart at {chart_path}."
        
        response = self.session.send_message(prompt)
        return response.text

# 4. RUN CAPSTONE
# coda = CODA()
# print(coda.run_briefing("Google", "revenue_chart.png"))

7. Phase 3: The Safety and Governance Audit

Before "Shipping" CODA, add the following checks:

Boundary: Ensure CODA refuses to research illegal or harmful activities.
Traceability: Print the chat.history to a log file after every session.
Budget: Add a token counter to notify you if the briefing exceeds 50,000 tokens.

8. Final Evaluation Rubric

Criteria	1 (Developing)	5 (Mastery)
Agency	Agent follows human-provided steps.	Agent autonomously decides which tool to use and when.
Multimodality	Agent ignores the image input.	Agent connects visual data to its final textual strategy.
Logic	Direct Q&A without planning.	Uses a clear Thought-Action-Observation loop.
Production Ready	API keys are hard-coded.	Uses environment variables, logging, and error handling.

9. Submission and Reflection

Once you have completed your CODA script:

Record a 2-minute demo of the agent researching a company and reading a chart.
Write a 500-word reflection on the most difficult bug you encountered and how the Gemini ADK helped you solve it.

10. Conclusion: You are an Agent Architect

The road to building CODA is the final step in your transformation. You are no longer just a "Prompt Engineer"—you are a System Designer. You understand the deep mechanics of memory, the nuances of multimodality, and the critical importance of safety.

Go forth and build agents that change the world. We can't wait to see what you create.

Mission Accomplished.