Ranking and Relevance in Graph Retrieval: Scoring the Web

When you perform a "Neighborhood Expansion" (Module 4), you might get 100 related facts. But your LLM's context window is limited. You cannot feed it every single connection. You need a way to Rank your results. In traditional RAG, we rank by Cosine Similarity. In Graph RAG, we have something much more powerful: Topological Importance.

In this lesson, we will explore how to score the relevance of a graph node. We will look at Node Degree (popularity), PageRank (authority), and Path Weighting (confidence). We will learn how to build a "Scoring Pipeline" that ensures the most critical knowledge is always the first thing the LLM reads.

1. Node Degree: The "Popularity" Filter

The simplest way to rank a node is by its Degree (how many edges it has).

In-Degree: A node that many other nodes talk about.
Out-Degree: A node that provides a lot of "Outward links" (like a Summary or a Hub).

RAG Tip: If you retrieve 50 projects but can only show 3, prioritizing the ones with the highest Degree often gives the LLM the "Most Centrally Important" information first.

2. PageRank: Ranking by Authority

Based on the original Google algorithm, PageRank calculates a node's importance based on the importance of the nodes pointing to it.

A project mentioned by the CEO node is more important than a project mentioned by a Temporary_Note node.

Many graph databases (like Neo4j GDS) allow you to pre-calculate PageRank during ingestion. You can then use this score during retrieval: MATCH (n:Project) ... RETURN n.title ORDER BY n.pagerank DESC LIMIT 3.

3. Betweenness Centrality: The "Bridge" Score

This algorithm identifies nodes that act as "Bridges" between different parts of the graph.

RAG Case: If a node like Security_Protocol sits between the Engineering department and the Legal department, it will have high Betweenness.

When an agent is looking for "Common Ground" or "Dependencies," nodes with high betweenness are the most relevant.

graph TD
    A1---A2
    A2---A3
    A3---A1
    
    A3 --- B((Bridge Node))
    
    B --- C1
    C1---C2
    C2---C3
    C3---C1
    
    style B fill:#34A853,color:#fff
    note[Bridge Node has high Betweenness]

4. Path-Based Ranking (Decay Functions)

Relevance in a graph usually "Decays" as you move further away from the seed node.

1-Hop away: Weight 1.0 (Highly relevant).
2-Hops away: Weight 0.5 (Somewhat relevant).
3-Hops away: Weight 0.1 (Likely noise).

Implementation: When building your context, multiply the fact's importance by a Decay Factor based on its hop-count. This ensures the "Distant Relatives" of the topic don't push out the "Immediate Families."

5. Summary and Exercises

Ranking is the "Editor" of your retrieval process.

Node Degree is a quick proxy for importance.
PageRank identifies the "Authority" in the network.
Betweenness finds the critical "Chokepoints" and bridges.
Decay Functions keep the context focused on the immediate neighborhood.

Exercises

Ranking Strategy: You are building a "Tech Support Bot." Should you prioritize nodes with the highest In-Degree (Issues that many people have) or the highest PageRank (Official manuals)?
Decay Math: If you use a decay factor of $0.5$ per hop, what is the weight of a fact that is 4 hops away from the user's query? (Hint: $0.5^4$). Is it worth including in a 500-token prompt?
Visualization: In your mind, visualize a graph of "The Internet." Which node has the highest degree? Which has the highest PageRank?

In the next lesson, we will look at technical barriers: Handling Multi-Hop Queries at Scale.