Pathfinding and Neighborhood Expansion: Targeted Retrieval

Pathfinding and Neighborhood Expansion: Targeted Retrieval

Learn the two primary modes of graph-based retrieval. Master 'Neighborhood Expansion' for contextual summaries and 'Pathfinding' for logical reasoning between two disparate points.

Pathfinding and Neighborhood Expansion: Targeted Retrieval

We now know how to walk (Traversal). Now we need to know where we are going. In Graph RAG, there are two primary retrieval patterns that account for 90% of all real-world use cases: Neighborhood Expansion and Pathfinding.

In this lesson, we will learn the difference between "Surrounding an entity with context" (Neighborhood) and "Connecting two dots with a bridge" (Pathfinding). We will see how these patterns enable different types of AI answers—from "General Overviews" to "Specific Evidence Chains"—and how to implement them efficiently using Graph algorithms.


1. Neighborhood Expansion: The "360-Degree" View

Goal: Understand everything related to a single "Seed" entity.

When a user asks: "What do we know about Sudeep?", we perform Neighborhood Expansion. We start at the Sudeep node and pull in all nodes reachable within $k$ hops.

  • Outcome: A "Cloud" of facts.
  • Application: Executive summaries, profile generation, entity dossiers.
  • Challenge: The "Hub Node" problem. If Sudeep is connected to 5,000 files, the neighborhood is too big. You must use Weighted Expansion to pick the top 50 relationships.

2. Pathfinding: The "Evidence Chain"

Goal: Connect two specific entities that the user mentioned.

When a user asks: "How is Sudeep related to the project in Tokyo?", we perform Pathfinding. We find the Sudeep node and the Tokyo_Project node, and we search for the Shortest Path (or the Most Reliable Path) between them.

  • Outcome: A linear chain of facts.
  • Application: Troubleshooting, fraud detection, supply chain analysis, investigative journalism.
  • The "Eureka" Moment: This is often where the user learns something they didn't know. The path might go through a person or a server they hadn't considered.
graph LR
    subgraph "Neighborhood (The Cloud)"
    A((Sudeep)) --- B1[Project A]
    A --- B2[Office]
    A --- B3[Colleague]
    end
    
    subgraph "Pathfinding (The Bridge)"
    X((Sudeep)) --> Y1[VPN]
    Y1 --> Y2[Proxy]
    Y2 --> Z((Tokyo))
    end
    
    style A fill:#4285F4,color:#fff
    style X fill:#4285F4,color:#fff
    style Z fill:#34A853,color:#fff

3. When to Use Which?

RequirementNeighborhood ExpansionPathfinding
Input1 Entity2+ Entities
LogicAggregation / SummaryLineage / Causality
Output TypeA list of traitsA sequence of events
RiskInformation OverloadNo path found

4. Implementation: Finding the Bridge in Python

Let's use the nx.shortest_path algorithm to connect two distant concepts.

import networkx as nx

G = nx.Graph()
G.add_edges_from([
    ("Sudeep", "VPN"),
    ("VPN", "Gateway"),
    ("Gateway", "Tokyo"),
    ("Sudeep", "Lunch"),
    ("Lunch", "Sandwich")
])

# 1. Neighborhood of Sudeep (1-hop)
neighbors = list(G.neighbors("Sudeep"))
print(f"Sudeep's Neighbors: {neighbors}")

# 2. Path from Sudeep to Tokyo
try:
    bridge = nx.shortest_path(G, "Sudeep", "Tokyo")
    print(f"Path to Tokyo: {' -> '.join(bridge)}")
except nx.NetworkXNoPath:
    print("No relationship found.")

# RESULT:
# Sudeep's Neighbors: ['VPN', 'Lunch']
# Path to Tokyo: Sudeep -> VPN -> Gateway -> Tokyo

5. The "Path Score" and Multi-Path Retrieval

In Graph RAG, we don't always want just one path. There might be 3 different ways Sudeep is related to Tokyo.

Pro Pattern: Retrieve the Top-K Shortest Paths. This gives the LLM multiple perspectives on the relationship, which it can then summarize for the user. "Sudeep is related to Tokyo via the VPN infrastructure, but also via a shared meeting with the regional VP last month."


6. Summary and Exercises

Targeted retrieval is the difference between "Searching" and "Finding."

  • Neighborhood Expansion gathers context around a single point.
  • Pathfinding connects two points with a chain of logic.
  • Weighted paths prioritize reliability over distance.

Exercises

  1. Pattern Match: You are building a "Product Catalog Bot." The user asks: "Is this charger compatible with my phone?" Is this a Neighborhood Expansion or a Pathfinding task?
  2. Breadth Constraint: If a Neighborhood Expansion returns 200 nodes, how do you decide which ones to keep for a 2,000-token LLM prompt? (Hint: Use PageRank or Node Degree).
  3. Path Drill: Pick two people in your life who have never met. Can you find a path of 3 relationships that connects them? (e.g., You -> Coworker -> Their Spouse -> Their Friend).

In the next lesson, we will finalize Module 4 by comparing the two major types of graphs you'll encounter: Knowledge Graphs vs Property Graphs.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn