Neighbor Ranking: Selecting the Best Context

In a dense graph, a single project might be connected to 500 different artifacts: emails, commits, meeting notes, people, and tools. If you send all 500 neighbors to the LLM, you will hit your Token Limit and confuse the AI with "Information Overload." You must be a Curator. You must choose the "High-Signal" neighbors and discard the "Noise."

In this lesson, we will look at Neighbor Ranking. We will learn two primary methods for selection: Static Property Ranking (using the scores we built in Module 11) and Semantic Re-ranking (using a cross-encoder to compare the neighbor to the user's specific question). We will see how this "Pre-Prompt Pruning" increases accuracy and reduces costs.

1. The Ranking Hierarchy

Topological Ranking: Prioritizing nodes with high PageRank or Centrality (Meaning "Trustworthy" facts).
Semantic Ranking: Prioritizing nodes whose text description matches the query keywords.
Temporal Ranking: Prioritizing the "Newest" facts.

The Golden Mix: The best RAG systems use a weighted score. Score = (0.4 * Semantic) + (0.4 * Topological) + (0.2 * Temporal).

2. Re-ranking with Cross-Encoders

Instead of just relying on the graph distance, we can use a small transformer model (a Re-ranker) to "Double Check" the neighbors.

Current nodes in result: 50.
AI takes user question + neighbor text.
Re-ranker outputs a score 0-1 for each.
Only the Top 5 are kept for the final prompt.

This ensures that even if a node is "Near" the seed in the graph, it is only included if it is Relevant to the user.

3. Handling the "Fan-Out" Pruning

In multi-hop paths, the number of neighbors "Explodes."

Hop 1: 5 neighbors.
Hop 2: 5 * 10 = 50.
Hop 3: 50 * 10 = 500.

The Strategy: At every hop, you must Prune. You keep only the top N neighbors based on your ranking algorithm. This is called Beam Search Retrieval.

graph TD
    S((Seed)) --> N1[Neighbor 1: Rank 0.9]
    S --> N2[Neighbor 2: Rank 0.4]
    S --> N3[Neighbor 3: Rank 0.1]
    
    N1 -->|Keep| LLM[LLM Prompt]
    N2 -->|Keep| LLM
    N3 -->|Discard| Trash[Token Savings]
    
    style N1 fill:#34A853,color:#fff
    style N3 fill:#f44336,color:#fff

4. Implementation: Ranking Neighbors in Cypher

MATCH (target:Project {name: $name})-[r]-(neighbor)
RETURN neighbor.name, 
       neighbor.description,
       // Combine importance and freshness
       (neighbor.pagerank * 0.5) + (neighbor.freshness * 0.5) as context_score
ORDER BY context_score DESC
LIMIT 5;

5. Summary and Exercises

Ranking is the "Editor" of the Knowledge Graph.

Context Windows are limited and expensive.
Static scores (PageRank) identify authoritative facts.
Semantic scores (Re-ranking) identify relevant facts.
Aggressive Pruning prevents the "Lost in the Middle" problem.

Exercises

Ranking Strategy: If a user asks "What happened yesterday?", should you prioritize PageRank or Timestamp?
The "Re-ranker" Cost: A re-ranker adds 100ms to your query. Is it worth it if it reduces your prompt size by 50%?
Visualization: Draw a "Fan-out" tree where you keep only the 2 best branches at each level.

In the next lesson, we will look at descriptive patterns: Building Narrative Context via Path Descriptors.