Handling Token Limits in Graph Expansion: The Budgeting Act

Handling Token Limits in Graph Expansion: The Budgeting Act

Don't choke the LLM. Learn how to manage the 'Explosion' of context that happens during graph expansion and how to use 'Soft' and 'Hard' token budgets to keep your costs under control.

Handling Token Limits in Graph Expansion: The Budgeting Act

Graphs are Recursive. If you expand 2 hops from a node, you might find 20 nodes. If you expand 3 hops, you might find 500. This is the "Context Explosion." If you blindly feed all these nodes to an LLM, your costs will skyrocket, and your model performance will plummet due to the "Lost-in-the-Middle" effect. You need a Budget.

In this lesson, we will look at Token-Aware Expansion. We will learn how to implement "Stopping Rules" that halt the graph traversal once a certain token count is reached. We will explore Summarization-during-Expansion and how to use the "Importance Score" (Module 11) to decide which nodes get the axe when the budget is tight.


1. The Conflict: Search Depth vs. Prompt Space

  • Search depth: You want to go deep (3+ hops) to find hidden links.
  • Prompt Space: Most LLMs perform best with < 10,000 tokens of context.

The Solution: You should never perform a "Global Expansion." You should perform a Ranked Expansion.


2. Implementing "Hard" and "Soft" Budgets

The Soft Budget (The Warning)

Once you reach 5,000 tokens, the system stops retrieving "Low-Priority" nodes (e.g., meeting logs) and only retrieves "High-Priority" nodes (e.g., core facts).

The Hard Budget (The Wall)

Once you reach 8,000 tokens, the traversal stops completely. The system returns whatever it has found so far.


3. Summarization-on-the-Fly

If a node has a very long description (e.g., a 2,000-word project summary), do not include the whole thing.

  • The Optimization: Use a small model (like GPT-4o-mini) to generate a "30-word snippet" of every secondary neighbor.
  • Benefit: You can include 10 times as many nodes in the same prompt space.
graph TD
    S((Seed)) --> N1[Long Node: 500 tokens]
    S --> N2[Long Node: 400 tokens]
    
    subgraph "Compression Layer"
    N1 --> C1[Summary: 20 tokens]
    N2 --> C2[Summary: 20 tokens]
    end
    
    C1 & C2 --> LLM[LLM Prompt]
    
    style S fill:#4285F4,color:#fff
    style LLM fill:#34A853,color:#fff

4. Implementation: A Token-Aware Python Loop

def retrieve_with_budget(seed_node, max_tokens=2000):
    total_tokens = 0
    context_chunks = []
    
    # Expand one by one
    for neighbor in get_neighbors_ranked(seed_node):
        text = neighbor.serialize()
        token_count = len(text.split()) # Rough estimate
        
        if total_tokens + token_count > max_tokens:
            print("BUDGET EXCEEDED: Stopping expansion.")
            break
            
        context_chunks.append(text)
        total_tokens += token_count
        
    return "\n".join(context_chunks)

5. Summary and Exercises

Token management is the Financial Governance of Graph RAG.

  • Ranked expansion ensures the best nodes enter the prompt first.
  • Hard thresholds prevent runaway costs.
  • Summarization allows for higher "Literal Density" in the context window.
  • Information Value (ROI): Ask: "Is this 5th neighbor worth the $0.05 I'm about to pay for its tokens?"

Exercises

  1. Budget Math: Your context window is 4096 tokens. Each node summary is 50 tokens. How many nodes can you include before you hit the limit?
  2. The "Priority" Choice: If you have 5 tokens left, and you find a node about a "Deadline" and a node about a "Thank you email," which one do you include?
  3. Visualization: Draw a graph representing "Depth" on the X-axis and "Token Count" on the Y-axis. Show how the line curves upward (exponentially) as you increase depth.

In the next lesson, we will look at how to teach the AI by example: Few-Shot Prompting with Graph Samples.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn