Module 2 Lesson 3: Streaming Responses
·LangChain

Module 2 Lesson 3: Streaming Responses

Zero Latency UX. How to use LangChain's .stream() method to display text as it's being generated.

Streaming: Eliminating the Wait

In Module 1, we used .invoke(). This method waits until the model has finished its entire answer before returning it. For long answers, the user might wait 10-20 seconds in silence. Streaming fixes this by delivering the answer "token by token."

1. The UX Advantage

Without streaming, the user sees a spinner. With streaming, the user sees text appearing immediately, making the application feel responsive and "Alive."


2. Using the .stream() Generator

Instead of one big object, .stream() returns a Python "Generator" that yields "Chunks" of the message.

# Instead of response = model.invoke(...)
chunks = []
for chunk in model.stream("Write a long poem about the ocean."):
    # Print each piece immediately without a newline
    print(chunk.content, end="", flush=True)
    chunks.append(chunk)

3. Chunks vs. Messages

  • A Chunk (ChatGenerationChunk) is just a fragment of a message.
  • Chunks can be added together to create a final AIMessage.
  • final_message = chunks[0] + chunks[1] + ...

4. Visualizing the Byte Stream

sequenceDiagram
    participant U as User (UI)
    participant L as LangChain
    participant A as OpenAI API
    
    U->>L: Invoke prompt
    L->>A: Start generation
    A-->>L: Token: 'The'
    L-->>U: Show 'The'
    A-->>L: Token: 'Capital'
    L-->>U: Show 'Capital'
    A-->>L: [Stream Finished]

5. Why Not Always Stream?

Streaming is great for Chat, but it can be annoying for:

  • Backend logs: You don't want 500 lines of log for one sentence.
  • Structured Data (JSON): You can't parse half a JSON object. You usually wait for the full block before converting it to a Python dictionary.

Key Takeaways

  • .stream() reduces the "Time to First Token" (TTFT).
  • It uses Python generators for efficient memory handling.
  • Streaming is primarily a UX/Frontend improvement.
  • Chunks must be aggregated if you need the full message after the stream ends.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn