Building Robust API Connectors: Auth and Error Handling

A tool that only calculates math is safe and self-contained. However, most real-world agents need to talk to External APIs: Slack, Salesforce, GitHub, or a private company database. When you move from local functions to network-based tools, you encounter a different set of engineering challenges: Authentication, Latency, Rate Limiting, and Partial Failures.

In this lesson, we will learn how to build "Enterprise-Grade" API connectors. We will explore how to manage secrets safely, how to implement exponential backoff for network calls, and how to protect your agent (and your wallet) from 3rd-party API failures.

1. Secret Management: Never Hard-Code Auth

If your Gemini agent needs to post to Slack, it needs a Bearer Token.

The Security Anti-Pattern

Hard-coding the token in your tool function or including it in the System Prompt. (LLMs can easily leak this data if tricked).

The Security Pattern

Use Environment Variables or a Secret Manager to fetch the token at runtime. The Gemini model should never see the actual token; it should only see the "Intent" to use the tool.

import os
import requests

def post_to_slack(message: str):
    """Posts a status update to the engineering Slack channel."""
    # The agent doesn't need to know YOUR_SLACK_TOKEN
    token = os.getenv("SLACK_BOT_TOKEN")
    url = "https://slack.com/api/chat.postMessage"
    
    headers = {"Authorization": f"Bearer {token}"}
    payload = {"channel": "#eng-status", "text": message}
    
    response = requests.post(url, headers=headers, json=payload)
    return response.json()

2. Handling Network Failures (The Retry Strategy)

The internet is unreliable. A tool might fail because of a 500ms network blip. You don't want your entire agentic loop to crash because of a single failed GET request.

The "Instructional Retry" Pattern

Code-Level Retry: Use a library like tenacity to automatically retry the API call 3 times before giving up.
Agent-Level Feedback: If the call fails after 3 retries, return a message that tells Gemini why it failed so it can try a different approach.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def call_weather_api(lat, lon):
    # This will automatically retry with increasing delays
    response = requests.get(f"https://api.weather.com/v1/{lat}/{lon}")
    response.raise_for_status() 
    return response.json()

3. Managing Rate Limits (External)

Many APIs (like Twitter or LinkedIn) have strict Rate Limits. If your agent is in a recursive loop and calls post_tweet() 50 times in a minute, you will get banned.

Implementation: The "Token Bucket" Tool

Add a local "Guard" inside your tool that tracks usage.

Goal: Protect the external API from the "Over-enthusiastic" agent.
Logic: If usage > limit, return: "ERROR: The API rate limit has been reached. Please wait 60 seconds before trying this action again."

4. Architectural Diagram: The Robust Connector

graph TD
    A[Gemini Tool Call] --> B{Secret Manager}
    B -->|Fetch Token| C[Connector Logic]
    C --> D{Circuit Breaker}
    D -->|Healthy| E[External API]
    D -->|Failing| F[Return 'System Down' Error]
    E -->|Success| G[Return Data to Gemini]
    E -->|429/500 Error| H[Retry Loop]
    H --> E
    H -->|Exhausted| F
    
    style E fill:#34A853,color:#fff
    style H fill:#F4B400,color:#fff

5. PII Filtering in Tool Data

Sometimes an external API returns more info than you want to share with the LLM (e.g., a Database query returns a "Password Hash" along with the "Username").

The Rule: ALWAYS strip sensitive fields in your Python connector before sending the result back to Gemini. Gemini should only see what is strictly necessary for the reasoning task.

def get_user_profile(user_id: str):
    raw_data = database.query(f"SELECT * FROM users WHERE id = '{user_id}'")
    
    # FILTER DATA FOR AGENT SAFETY
    safe_data = {
        "name": raw_data['name'],
        "location": raw_data['location'],
        # "credit_card": raw_data['cc'] <--- REMOVED FOR SAFETY
    }
    return safe_data

6. Long-Poll and Webhook Tools

Some APIs take minutes to respond. You can't keep an LLM connection open for 5 minutes.

The Async Tool Patern:

Agent: "Run the report."
Tool: "Report started. Request ID: 123. Use 'check_status' to see when it is done."
Agent: "Okay. I'll check back in a moment." (The agent continues other tasks or waits).

7. Monitoring Tool Latency and Performance

If a tool becomes slow, it increases the total "Turn Time" for the agent. Professional ADK setups use Telemetry.

Log every tool call's latency.
If a tool's average latency exceeds 5 seconds, flag it in your dashboard as a bottleneck.

8. Summary and Exercises

Building tools is Software Engineering, not prompt engineering.

Secret Management is non-negotiable for security.
Retry logic (Tenacity) ensures network resilience.
Filtering ensures data privacy.
Rate limiting prevents budget overruns and 3rd-party bans.

Exercises

Connector Design: Choose a public API (e.g., GitHub, OpenWeather, NASA). Write a "Robust Connector" function that includes: 1. API Key from Env, 2. A try/except block, 3. A custom error message for the model.
Security Audit: Write a list of 5 fields you should never return from a database tool to a Gemini agent.
Logic Mapping: Draw a flowchart for an agent that needs to call an API that is currently returning a 503 Service Unavailable. What should the agent say to the user?

In the next lesson, we will look at Dynamic Tool Discovery, learning how agents can navigate hundreds of possible tools without overwhelming their context window.