
Handling Long Contexts
Master the operational side of multi-thousand token prompts, including batching and context management.
Handling Long Contexts
Claude 3.5 Sonnet supports up to 200,000 tokens. To put that in perspective, that's over 400 pages of text. While this opens up massive RAG possibilities, it also presents new challenges.
When to Use Long Context RAG
- Legal Discovery: Loading 50 contracts into a single prompt for cross-comparison.
- Deep Code Analysis: Ingesting an entire repository to find a bug.
- Historical Summarization: Analyzing a decade of quarterly reports in one go.
Operational Challenges
1. Latency
A 100k token prompt can take 20-30 seconds to process (Time to First Token). This makes it unsuitable for "Interactive Chat" but perfect for "Analytical Reports."
2. Context Caching (Critical)
Anthropic's Prompt Caching allows you to "save" the first 100k tokens of a prompt and only pay/wait for the new additions.
- Setup: Mark the
contextblock with acache_controlflag. - Benefit: Up to 90% cheaper and 10x faster for repeated queries.
3. Context Window Saturation
If you fill the window to 90% capacity, the model's reasoning accuracy may dip. Target 50-70% occupancy for mission-critical tasks.
Best Practices for Long Prompts
- Structure Clearly: Use Headers, Page Numbers, and ID tags.
- Summarize Chunks: If you have 500 documents, don't send all of them. Send the top 50 in full and summaries of the rest.
- Chunked Generation: If you need to summarize 200k tokens, don't ask for one summary. Use a "sliding window" or "hierarchical" summarization approach.
Implementation Example: Mapping large files
{
"system": "Analyze the following documentation repository.",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "File: main.py\n(10,000 lines of code...)",
"cache_control": {"type": "ephemeral"}
},
{"role": "user", "content": "Where is the auth logic?"}
]
}
]
}
Exercises
- Calculate the token count of your favorite book. Would it fit in Claude's window?
- How does "Prompt Caching" change the ROI (Return on Investment) of a RAG system?
- What is a "Map-Reduce" strategy for summarization?