Module 9 Lesson 4: Streaming and Async Behavior
Zero latency. Understanding how to build agents that speak while they think using async Python and SSE.
Streaming and Async: Building for Speed
One of the most frustrating parts of using LLMs is the wait. A large model might take 15 seconds to generate a full report. If your agent is "Blocking" (waiting for the full response), your user will think the app is broken.
Streaming and Async behavior allow the agent to start interacting immediately.
1. Tokens vs. Messages
- Blocking: The LLM generates 500 tokens. You wait 10 seconds. You receive the full message.
- Streaming: The LLM generates token 1. You receive it in 100ms. Token 2... Token 3...
- The Result: The user sees text appearing on the screen instantly, creating an "AI presence" that feels much faster than it actually is.
2. Asynchronous Agents (Concurrency)
If you have a multi-agent system, you shouldn't run them sequentially if they don't depend on each other.
Sequential (Slow):
- Agent 1 translates (10s)
- Agent 2 summarizes (10s) Total: 20s
Asynchronous (Fast):
- Agent 1 and Agent 2 start at the same time. Total: 10s (the time of the longest task).
3. Implementation Logic (Python asyncio)
Using async and await is mandatory for professional agents.
import asyncio
async def call_agent_1():
# Simulate API call
await asyncio.sleep(2)
return "Result 1"
async def call_agent_2():
# Simulate API call
await asyncio.sleep(2)
return "Result 2"
async def main():
# Run BOTH at the same time
results = await asyncio.gather(call_agent_1(), call_agent_2())
print(results)
asyncio.run(main())
4. Streaming Intermediate "Thoughts"
In an agentic loop, it's helpful to stream the agent's Thoughts to a hidden debug log or a sidebar.
- "Searching for weather..."
- "Found result: 22C."
- "Calculating clothes to wear..." This keeps the user informed while the "main" answer is being prepared.
5. Visualizing the Concurrency
graph TD
User[Start Project] --> Split{Async Gather}
Split --> A1[Agent: Researcher]
Split --> A2[Agent: Image Gen]
Split --> A3[Agent: Security Check]
A1 --> Merge[Aggregation]
A2 --> Merge
A3 --> Merge
Merge --> Final[Result]
Key Takeaways
- Streaming reduces perceived latency by providing instant feedback.
- Async/Await allows agents to work in parallel, doubling or tripling performance.
- Always use Concurrent Execution for tasks that are independent (e.g., getting info from 3 different sources).
- Streaming Intermediate Steps is a major UX improvement for long-running agents.