Graph Traversal Concepts: The AI's Search Path

In the early modules, we talked about "Walking the Graph." Now, we are going to look at the Gait of that walk. How exactly does an AI agent move from node to node? Does it look at all neighbors first? Or does it follow one single path to the very end?

In this lesson, we will explore the two fundamental traversal strategies: BFS (Breadth-First Search) and DFS (Depth-First Search). We will learn how these strategies impact the "Context" your RAG system builds, the danger of "Exploding Context," and why a "Bounded Breadth" approach is the standard for high-performance agentic systems.

1. Breadth-First Search (BFS): The "Zoom Out"

Definition: Starting at the query node, you visit all "1-hop" neighbors first, then all "2-hop" neighbors, and so on.

Analogy: A stone thrown into a pond. The ripples go out in all directions equally.
RAG Case: "Tell me everything about Project Titan."
- The system finds the Titan node.
- It retrieves the 10 closest facts (1-hop).
- It then retrieves the 50 related facts (2-hops).

Pros: Provides a broad, balanced summary. You won't miss a high-level fact. Cons: Can quickly consume your entire token budget. (The "Exponential Explosion").

2. Depth-First Search (DFS): The "Rabbit Hole"

Definition: Starting at the query node, you follow one edge to the end, then backtrack and try the next path.

Analogy: An explorer in a cave. You follow one tunnel as far as it goes before coming back to the fork in the road.
RAG Case: "Trace the lineage of this specific security bug."
- The system follows: Bug -> Derived From -> Commit -> Author -> Reviewer.

Pros: Excellent for finding specific "Chains of Causality." Cons: You might spend your entire context window on one irrelevant detail while missing a crucial fact just one hop away in a different direction.

graph TD
    subgraph "BFS (Level by Level)"
    B_Start((Start)) --> B1((1))
    B_Start --> B2((1))
    B1 --> B3((2))
    B2 --> B4((2))
    end
    
    subgraph "DFS (Path by Path)"
    D_Start((Start)) --> D1((1))
    D1 --> D2((2))
    D2 --> D3((3))
    end
    
    style B_Start fill:#4285F4,color:#fff
    style D_Start fill:#34A853,color:#fff

3. The "k-Hop" Constraint: Avoiding Context Explosion

In a Knowledge Graph, every node is potentially connected to every other node eventually (The "6 Degrees of Separation").

If you don't set a Traversal Limit, your Graph RAG system will try to retrieve the entire library for a single question.

k=1: The "Direct Facts" (High precision, low context).
k=2: The "Extended Context" (The sweet spot for most RAG).
k=3+: The "Knowledge Deep Dive" (Requires heavy filtering to stay relevant).

4. Implementation: BFS vs. DFS in Python

Let's see how the search order changes the data we get.

import networkx as nx

G = nx.Graph()
G.add_edges_from([
    ("Query", "A"), ("Query", "B"),
    ("A", "C"), ("B", "D")
])

# 1. Breadth-First Search
print("BFS Order:")
for edge in nx.bfs_edges(G, source="Query"):
    print(edge) 
# Results: (Query, A), (Query, B) -> Level 1 first

# 2. Depth-First Search
print("\nDFS Order:")
for edge in nx.dfs_edges(G, source="Query"):
    print(edge)
# Results: (Query, A), (A, C) -> Follows A to the end first

5. Pruned Traversal: The "Agent" Way

In production Graph RAG, we don't just use raw BFS. we use Pruned Traversal.

An LLM acts as the "Navigator." It looks at the 10 neighbors and says: "Only follow the edges labeled 'DEPENDS_ON' and ignore 'MENTIONED_IN'."

This combination of Graph Algorithms and LLM Intuition is how we achieve "Smart Retrieval" without hitting token limits.

6. Summary and Exercises

Traversal is the "Gait" of your AI's reasoning.

BFS provides the big picture (Summary).
DFS proves the deep path (Lineage).
k-hop limits are mandatory to prevent system crashes.
Pruning is the "Smart Filter" that makes Graph RAG efficient.

Exercises

Traversal Choice: You are building a bot to answer "Who is related to the suspect?" Should you use BFS or DFS? Why?
Explosion Test: If every node in your graph has 10 edges, how many nodes do you retrieve at k=1, k=2, and k=3? (10, 100, 1,000). How many tokens would that be?
The "Pruner" Prompt: Write a short system prompt that tells an LLM how to choose which graph edge to follow when looking for "Root Cause of a Server Outage."

In the next lesson, we will look at the specific mathematical goals of these traversals: Pathfinding and Neighborhood Expansion.