Tool Invocation Flow: Arguments, Execution, and Feedback

When you call chat.send_message("What is the weather in Paris?"), a complex, high-speed orchestration begins. Within milliseconds, the Gemini model must decide if it needs a tool, select the correct one, generate valid arguments, and pause its own generation. The Gemini ADK then wakes up, identifies the model's intent, runs your Python code, and feeds the result back to the model.

In this lesson, we will trace the Tool Invocation Flow token-by-token. We will explore the structure of FunctionCall and FunctionResponse objects, the mechanics of parallel execution, and how to debug the "silent" data flow between the AI and your infrastructure.

1. The 4-Stage Invocation Sequence

A single tool interaction follows a rigid, four-stage lifecycle.

Stage 1: Determination (Gemini's Mind)

Based on the System Instructions and the Tool Docstrings, Gemini evaluates the user's request. If it cannot answer from its own internal knowledge, it produces a specialized token sequence that indicates a FunctionCall rather than a text response.

Stage 2: The Tool Call (The Hand-off)

Gemini outputs a structured object containing:

name: The name of the tool as defined in your Python function.
args: A JSON block of arguments (e.g., {"location": "Paris"}).

Stage 3: The Execution (The Real World)

The ADK Runtime intercepts this output. It maps the name to the actual Python function in your code and calls it with the provided args.

Critical Detail: The execution happens in your environment, not Google's servers. This is why you can access local files or private databases.

Stage 4: The Feedback (The Loop Closure)

The result of your Python function is wrapped in a FunctionResponse object and sent back to Gemini. This result is treated as an Observation. Gemini then reads this observation and uses it to generate the final human-readable response.

sequenceDiagram
    participant G as Gemini (Brain)
    participant R as ADK Runtime (Nerves)
    participant P as Python Code (Muscle)
    
    Note over G: Determining Need...
    G->>R: FunctionCall(name='search', args={'q': 'Paris'})
    Note over R: Executing Logic...
    R->>P: search('Paris')
    P-->>R: Returns "Cloudy, 15°C"
    R->>G: FunctionResponse(name='search', content="Cloudy, 15°C")
    Note over G: Reasoning with Result...
    G-->>R: "It is currently 15°C and cloudy in Paris."

2. Deep Dive: The `FunctionCall` Object

In the Gemini ADK, you can actually capture and inspect these raw objects. This is vital for debugging.

A typical FunctionCall looks like this in the raw API response:

{
  "function_call": {
    "name": "get_weather",
    "args": {
      "city": "Paris",
      "units": "celsius"
    }
  }
}

Why this is better than String Parsing:

Type Safety: The arguments are already parsed into a JSON dictionary. You don't need to use regex to find the city name inside a sentence.
Determinism: The model is forced to follow your schema. If your function requires a units parameter, it will provide it.

3. The `FunctionResponse` Object: Feedback Loop

Once your function finishes, the output must be sent back to the model. However, you can't just send a string. You must send a response that matches the call's ID and name.

# Conceptual representation of the feedback step
response_payload = {
    "function_response": {
        "name": "get_weather",
        "response": {
            "temperature": 15,
            "condition": "Cloudy"
        }
    }
}

Gemini sees this and "realizes" that its previous request was successful. It then incorporates this new data into its next reasoning step.

4. Parallel Invocation Flow

If Gemini decides to call multiple tools at once, the flow changes to a "Batch" pattern.

Multiple Calls: Gemini emits a list: [Call(id=1, name='A'), Call(id=2, name='B')].
Parallel Execution: The ADK (if configured for parallelism) can fire off both Python functions simultaneously. This is a massive performance win.
Unified Feedback: You send back a list of responses: [Response(id=1), Response(id=2)].

5. Passing Files and Multimodal Data to Tools

One of the most advanced flows in the Gemini ADK is passing Non-Text Data to a tool.

Example: You have a tool called edit_image(image_file, filter_name).

User: Sends an image of a dog.
Gemini: Sees the image. Decides it needs to be brighter.
Flow: Gemini emits the tool call. The ADK passes the image_file (either as bytes or a URI) to the Python function.
Result: The tool processes the image and returns a path to the new, edited file.

6. Debugging the Invocation Flow

When an agent "fails," it's often in this flow. Common issues include:

Argument Mismatch: Gemini passed a string "10" for an argument you typed as an int.
Missing Context: Gemini called the tool but forgot to include the user_id.
Malformed Feedback: The tool returned a list when Gemini expected a simple string.

How to Debug:

Use the Part inspector in the Python SDK.

response = chat.send_message("...")
for part in response.candidates[0].content.parts:
    if part.function_call:
        print(f"DEBUG: Call detected for {part.function_call.name}")
        print(f"DEBUG: Args: {part.function_call.args}")

7. Implementation: The Manual "Bypass" Flow

While enable_automatic_function_calling=True handles the flow for you, sometimes you want to take over. This is useful for Human-in-the-loop confirmations.

import google.generativeai as genai

model = genai.GenerativeModel('gemini-1.5-pro', tools=[my_tool])
chat = model.start_chat()

# 1. Send Message
response = chat.send_message("Set up a meeting for 2 PM.")

# 2. Manual Interception
if response.candidates[0].content.parts[0].function_call:
    call = response.candidates[0].content.parts[0].function_call
    
    # ASK USER FOR PERMISSION
    approval = input(f"Agent wants to call {call.name} with {call.args}. Approve? (y/n)")
    
    if approval == 'y':
        # 3. Execute and Manually provide feedback
        result = my_tool(**call.args)
        response = chat.send_message(
            genai.protos.Content(
                parts=[genai.protos.Part(
                    function_response=genai.protos.FunctionResponse(name=call.name, response={'result': result})
                )]
            )
        )

8. Summary and Exercises

The Tool Invocation Flow is the Nervous System of your agent.

Gemini determines the intent.
ADK Runtime executes the muscle (your code).
Feedback closes the reasoning loop.
Parallelism and Multimodality scale the performance and capability.

Exercises

Trace Flow: Draw a flow map for an agent that needs to calculate a tip. 1. User says "Total is 100." 2. Agent calls calc_tip. 3. Agent returns "Tip is 20." Trace every step of the ADK lifecycle.
Argument Debugging: Write a Python function that expects a specific JSON structure. Call the agent with a prompt that is slightly ambiguous. See if Gemini produces the correct JSON arguments.
Manual Control: Try your hand at the "Manual Bypass" code above. Can you build an agent that asks you for approval before it "deletes" a dummy file?

In the next module, we move from the "Hands" to the "Memory," exploring how our agents maintain context and learn over time.