
Neighbor Ranking: Selecting the Best Context
More isn't always better. Learn how to rank the neighbors of a retrieved node to ensure that only the most relevant, high-quality facts make it into your AI's limited context window.
Neighbor Ranking: Selecting the Best Context
In a dense graph, a single project might be connected to 500 different artifacts: emails, commits, meeting notes, people, and tools. If you send all 500 neighbors to the LLM, you will hit your Token Limit and confuse the AI with "Information Overload." You must be a Curator. You must choose the "High-Signal" neighbors and discard the "Noise."
In this lesson, we will look at Neighbor Ranking. We will learn two primary methods for selection: Static Property Ranking (using the scores we built in Module 11) and Semantic Re-ranking (using a cross-encoder to compare the neighbor to the user's specific question). We will see how this "Pre-Prompt Pruning" increases accuracy and reduces costs.
1. The Ranking Hierarchy
- Topological Ranking: Prioritizing nodes with high PageRank or Centrality (Meaning "Trustworthy" facts).
- Semantic Ranking: Prioritizing nodes whose text description matches the query keywords.
- Temporal Ranking: Prioritizing the "Newest" facts.
The Golden Mix: The best RAG systems use a weighted score. Score = (0.4 * Semantic) + (0.4 * Topological) + (0.2 * Temporal).
2. Re-ranking with Cross-Encoders
Instead of just relying on the graph distance, we can use a small transformer model (a Re-ranker) to "Double Check" the neighbors.
- Current nodes in result: 50.
- AI takes user question + neighbor text.
- Re-ranker outputs a score 0-1 for each.
- Only the Top 5 are kept for the final prompt.
This ensures that even if a node is "Near" the seed in the graph, it is only included if it is Relevant to the user.
3. Handling the "Fan-Out" Pruning
In multi-hop paths, the number of neighbors "Explodes."
- Hop 1: 5 neighbors.
- Hop 2: 5 * 10 = 50.
- Hop 3: 50 * 10 = 500.
The Strategy: At every hop, you must Prune. You keep only the top N neighbors based on your ranking algorithm. This is called Beam Search Retrieval.
graph TD
S((Seed)) --> N1[Neighbor 1: Rank 0.9]
S --> N2[Neighbor 2: Rank 0.4]
S --> N3[Neighbor 3: Rank 0.1]
N1 -->|Keep| LLM[LLM Prompt]
N2 -->|Keep| LLM
N3 -->|Discard| Trash[Token Savings]
style N1 fill:#34A853,color:#fff
style N3 fill:#f44336,color:#fff
4. Implementation: Ranking Neighbors in Cypher
MATCH (target:Project {name: $name})-[r]-(neighbor)
RETURN neighbor.name,
neighbor.description,
// Combine importance and freshness
(neighbor.pagerank * 0.5) + (neighbor.freshness * 0.5) as context_score
ORDER BY context_score DESC
LIMIT 5;
5. Summary and Exercises
Ranking is the "Editor" of the Knowledge Graph.
- Context Windows are limited and expensive.
- Static scores (PageRank) identify authoritative facts.
- Semantic scores (Re-ranking) identify relevant facts.
- Aggressive Pruning prevents the "Lost in the Middle" problem.
Exercises
- Ranking Strategy: If a user asks "What happened yesterday?", should you prioritize PageRank or Timestamp?
- The "Re-ranker" Cost: A re-ranker adds 100ms to your query. Is it worth it if it reduces your prompt size by 50%?
- Visualization: Draw a "Fan-out" tree where you keep only the 2 best branches at each level.
In the next lesson, we will look at descriptive patterns: Building Narrative Context via Path Descriptors.