
Multi-Agent Orchestration: Controlling the Fleet
Learn how to manage multiple agents without context explosion. Master the synchronization patterns that keep token costs in check.
Multi-Agent Orchestration: Controlling the Fleet
In Module 9.2, we introduced the concept of specialization. Now, we look at the Infrastructure of Orchestration. When you have 10 agents working on the same project (a "Fleet"), how do they talk to each other without repeating the entire project history in every message?
If Agent A sends its context to Agent B, then Agent B adds its context and sends to Agent C... you have a Token Avalanche.
In this lesson, we learn the advanced orchestration patterns to prevent this avalanche. We’ll explore Hierarchical Orchestration, Sequential Chains, and Broadcast Hubs.
1. The Hierarchical Pattern (Control at the Top)
In this pattern, only the Supervisor has the full project context. The specialists only receive a "Snapshot" of their specific task.
- Token Efficiency: High. Only one agent (The Supervisor) deals with a large prompt. The workers use "Micro-Prompts" (Module 4.1).
- Control: The Supervisor acts as a "Token Governor," deciding what information each worker actually needs.
2. The Sequential Chain Pattern (The Relay Race)
Agents work in a line. Agent 1 finishes and hands off a Result Object (not a prompt string) to Agent 2.
graph LR
A[Agent 1: Researcher] -->|Result| B[Agent 2: Writer]
B -->|Result| C[Agent 3: Editor]
subgraph "Token Transfer"
A_T[Small Signal]
B_T[Small Signal]
end
Crucial Rule: Agent 3 should never see the raw research from Agent 1. It only sees the "Draft" from Agent 2. This creates Context Separation, which prevents the tokens from accumulating at the end of the chain.
3. The Broadcast Hub (The Shared State)
Instead of agents talking to each other, they publish their results to a Shared Hub (like a LangGraph State or a Redis Channel).
- Benefit: If Agent D needs a fact discovered by Agent A, it queries the Hub for that specific fact. It doesn't have to read Agent A's entire conversation history.
4. Implementation: The Orchestrator Node (Python)
Python Code: The Selective Handoff
def orchestrator(task, fleet):
# 1. Supervisor decides the team
specialist_id = dispatch_router(task)
# 2. SELECTION (The Efficiency Step)
# We only extract the 'Essential State' for this specific specialist
narrow_context = extract_task_specific_state(task, global_state)
# 3. Execution
result = fleet[specialist_id].run(narrow_context)
# 4. Global Update
global_state.update(result)
5. Avoiding "Double-Identity" Tokens
When Agent A talks to Agent B, they don't need to be polite.
- Waste: "Hello Agent B, I have finished the research. Could you please write a summary of this data: [DATA]?"
- Efficient:
TASK: Summarize. DATA: [DATA]
By stripping the "Agent Personality" from internal communications, you save dozens of tokens per handoff.
6. Summary and Key Takeaways
- Snapshots over Streams: Never pass full conversation histories between agents. Pass "Signal Snapshots."
- Specialization = Thin Context: Workers should be blind to the "Global" goals they aren't working on.
- Supervisor as Filter: The orchestrator is responsible for pruning the data before a handoff.
- Broadcast Architecture: Use a shared state hub to allow agents to "Pull" only the facts they need.
In the next lesson, The Supervisor Pattern (Advanced), we look at چگونه to build a "Smart Router" that optimizes for model cost first.
Exercise: The Fleet Design
- You are building an agent fleet to "Create a Website."
- Agent List: Designer, Copywriter, Coder, QA.
- Design the Information Flow:
- Which agent needs the "Full Vision" from the user?
- What does the Coder need from the Designer? (Does it need the Designer's reasoning? Or just the CSS values?)
- Calculate the savings if the Coder only receives the CSS values instead of the Designer's full conversation history.