Streaming and Async: Building for Speed

One of the most frustrating parts of using LLMs is the wait. A large model might take 15 seconds to generate a full report. If your agent is "Blocking" (waiting for the full response), your user will think the app is broken.

Streaming and Async behavior allow the agent to start interacting immediately.

1. Tokens vs. Messages

Blocking: The LLM generates 500 tokens. You wait 10 seconds. You receive the full message.
Streaming: The LLM generates token 1. You receive it in 100ms. Token 2... Token 3...
The Result: The user sees text appearing on the screen instantly, creating an "AI presence" that feels much faster than it actually is.

2. Asynchronous Agents (Concurrency)

If you have a multi-agent system, you shouldn't run them sequentially if they don't depend on each other.

Sequential (Slow):

Agent 1 translates (10s)
Agent 2 summarizes (10s) Total: 20s

Asynchronous (Fast):

Agent 1 and Agent 2 start at the same time. Total: 10s (the time of the longest task).

3. Implementation Logic (Python `asyncio`)

Using async and await is mandatory for professional agents.

import asyncio

async def call_agent_1():
    # Simulate API call
    await asyncio.sleep(2)
    return "Result 1"

async def call_agent_2():
    # Simulate API call
    await asyncio.sleep(2)
    return "Result 2"

async def main():
    # Run BOTH at the same time
    results = await asyncio.gather(call_agent_1(), call_agent_2())
    print(results)

asyncio.run(main())

4. Streaming Intermediate "Thoughts"

In an agentic loop, it's helpful to stream the agent's Thoughts to a hidden debug log or a sidebar.

"Searching for weather..."
"Found result: 22C."
"Calculating clothes to wear..." This keeps the user informed while the "main" answer is being prepared.

5. Visualizing the Concurrency

graph TD
    User[Start Project] --> Split{Async Gather}
    Split --> A1[Agent: Researcher]
    Split --> A2[Agent: Image Gen]
    Split --> A3[Agent: Security Check]
    A1 --> Merge[Aggregation]
    A2 --> Merge
    A3 --> Merge
    Merge --> Final[Result]

Key Takeaways

Streaming reduces perceived latency by providing instant feedback.
Async/Await allows agents to work in parallel, doubling or tripling performance.
Always use Concurrent Execution for tasks that are independent (e.g., getting info from 3 different sources).
Streaming Intermediate Steps is a major UX improvement for long-running agents.

Module 9 Lesson 4: Streaming and Async Behavior

Streaming and Async: Building for Speed

1. Tokens vs. Messages

2. Asynchronous Agents (Concurrency)

3. Implementation Logic (Python `asyncio`)

4. Streaming Intermediate "Thoughts"

5. Visualizing the Concurrency

Key Takeaways

Subscribe to our newsletter

Streaming and Async: Building for Speed

1. Tokens vs. Messages

2. Asynchronous Agents (Concurrency)

3. Implementation Logic (Python asyncio)

4. Streaming Intermediate "Thoughts"

5. Visualizing the Concurrency

Key Takeaways

Subscribe to our newsletter

3. Implementation Logic (Python `asyncio`)