
Top-K Neighborhood Retrieval: The Context Cloud
Master the most common Graph RAG retrieval pattern. Learn how to pull a concentrated 'cloud' of facts around a central entity to provide a 360-degree view of any topic.
Top-K Neighborhood Retrieval: The Context Cloud
In module 4, we learned that a Neighborhood is the set of facts surrounding a node. In module 8, we learned how to write the Cypher for it. Now, we arrive at the Strategy: How do we use this for a production AI assistant?
Top-K Neighborhood Retrieval is the bread and butter of Graph RAG. It is the strategy you use whenever a user asks a question about a specific "Thing" (e.g., "What is Project Titan?"). In this lesson, we will look at how to refine the "Context Cloud," how to handle the "Large Neighborhood" problem, and how to format these disparate facts into a narrative that an LLM can actually use.
1. The Strategy: Breadth over Depth
When a user asks about a specific entity, they aren't looking for a deep logical chain across 10 hops. They are looking for a Portrait.
- Level 1 (Direct): Facts owned by the entity (e.g., Name, Date, Owner).
- Level 2 (Inferred): Facts about the entity's connections (e.g., The owner's department, The project's dependencies).
The Strategy: Pull the "Most Important" $K$ nodes within a fixed number of hops (usually 1 or 2).
2. Ranking the "Importance" Within the Cloud
If a node has 500 neighbors, you cannot include all of them in the prompt. You must Sub-Select.
Selection Criteria:
- Semantic Match: Using the user's query vector to find the most relevant neighbors.
- Topological Match: Using Node Degree or PageRank to find the most "Notable" neighbors.
- Recentness: Prioritizing facts with the newest
timestamp.
3. Serialization: From Subgraph to Story
Once you have your neighbor nodes (say, 10 nodes for a Person), you have two ways to tell the LLM:
A. The "List of Facts" (Simple):
- "Sudeep works in London."
- "Sudeep manages the AI Team."
- "The AI Team uses Python."
B. The "JSON Graph" (Detailed):
{"node": "Sudeep", "relationships": [{"target": "AI Team", "type": "LEADS"}]}
RAG Tip: The "List of Facts" is usually better because LLMs are trained on natural language. They find it easier to weave these sentences into a coherent answer.
graph TD
C((Central Entity)) --> N1[Fact 1]
C --> N2[Fact 2]
C --> N3[Fact 3]
C --> N4[Fact 4]
subgraph "Top-3 Ranking"
N1
N2
N4
end
N3 -.-x LLM[LLM Prompt]
N1 & N2 & N4 --> LLM
style C fill:#4285F4,color:#fff
style LLM fill:#34A853,color:#fff
4. Implementation: A "Portrait" Retrieval Logic in Python
def get_entity_portrait(entity_name, k=10):
# 1. Fetch neighbors
# 2. Sort by 'Importance/Weight'
# 3. Translate to sentences
query = """
MATCH (e {name: $name})-[r]-(neighbor)
RETURN e.name + ' ' + type(r) + ' ' + neighbor.name as fact,
r.weight as importance
ORDER BY importance DESC
LIMIT $k
"""
results = db.run(query, {"name": entity_name, "k": k})
return "\n".join([r['fact'] for r in results])
# OUTPUT:
# Sudeep LEADS AI Team
# Sudeep LIVES_IN London
# Sudeep USES Python
5. Summary and Exercises
Neighborhood retrieval is about building a "Digital Dossier" on the fly.
- Breadth (width) is more important than Depth (hops) for summary questions.
- Ranking within the neighborhood is mandatory to fit in the context window.
- Sentencizing the graph facts is the most reliable way to feed the LLM.
Exercises
- Context Design: A user asks: "What is the history of our AWS usage?". Should you prioritize "1-hop direct facts" or "2-hop historical logs"?
- The "Noise" Filter: If a neighborhood includes a link to a "City" node that has 1 million other links, should you include that in the portrait? (Hint: General "Super-nodes" like
Londonor2024are often noise). - Visualization: Draw a 1-hop neighborhood of "Your Favorite Fruit." How many facts did you come up with?
In the next lesson, we will look at the opposite pattern: Path-Based Retrieval Patterns.