Centrality Algorithms: Finding the Key Players

Centrality Algorithms: Finding the Key Players

Discover the mathematical heart of your graph. Learn how to use PageRank, Degree Centrality, and Betweenness to automatically identify the most influential entities in your RAG system.

Centrality Algorithms: Finding the Key Players

Which node in your company graph is the most "Critical"? It’s not necessarily the CEO. It might be the Senior Architect who is the only bridge between the Legacy Team and the Cloud Team. It might be the "Shared Infrastructure" document that every project depends on. Finding these "Key Players" is the job of Centrality Algorithms.

In this lesson, we will explore the three most important centrality metrics for Graph RAG: Degree, PageRank, and Betweenness. We will learn how to interpret these scores and how to use them to create "Attention Filters" for your AI agents, ensuring they don't get distracted by irrelevant fringe data.


1. Degree Centrality: The "Popularity" Vote

Definition: The number of relationships connected to a node.

  • In-Degree (The Receiver): How many arrows point at the node. (Represents Authority/Focus).
  • Out-Degree (The Giver): How many arrows point away from the node. (Represents Hubs/Summary sources).

RAG Usage: If your agent is looking for a "Main Topic," it should look for the node with the highest In-Degree.


2. PageRank: The "Transitive" Authority

Definition: A node is important if it is connected to other important nodes.

This is a recursive algorithm. If Sudeep is connected only to the CEO, his PageRank will be higher than someone connected to 1,000 interns.

RAG Usage: When an LLM asks "Who should I trust for this information?", you can sort the candidate nodes by their PageRank. This provides a "Mathematical Seal of Approval" on the data.


3. Betweenness Centrality: The "Gatekeeper"

Definition: How many "Shortest Paths" between other nodes pass through this node.

A node with high Betweenness is a Chokepoint. If you delete it, the graph breaks into two isolated clusters.

RAG Usage: If the user asks about "System Risks" or "Structural Problems," the AI agent should prioritize nodes with high Betweenness, as these are the single points of failure.

graph TD
    A1---A2
    A2---A3
    A3---A1
    
    A2 --- B((High Betweenness))
    
    B --- C1
    C1---C2
    C2---C3
    C3---C1
    
    style B fill:#34A853,color:#fff
    note[PageRank: High in A1,A2,A3]
    note[Betweenness: High in B]

4. Implementation: Running PageRank in Cypher

Let's look at how we calculate and store the importance of our "Project" nodes.

// 1. Project the Graph (Module 11, Lesson 1)
CALL gds.graph.project('myGraph', 'Project', 'DEPENDS_ON')

// 2. Run PageRank
CALL gds.pageRank.write('myGraph', {
  writeProperty: 'pagerank_score'
})

// 3. Use it in RAG!
MATCH (p:Project)
WHERE p.status = 'Delayed'
RETURN p.title, p.pagerank_score
ORDER BY p.pagerank_score DESC
LIMIT 1;
// This returns the MOST IMPORTANT delayed project.

5. Summary and Exercises

Centrality is the "GPS of Importance."

  • Degree finds the popular kids.
  • PageRank finds the authoritative experts.
  • Betweenness finds the structural bridges.
  • GDS Write-Back allows the AI to "Sort" its context by truth-value.

Exercises

  1. Algorithm Selection: You are building a "Fake News Detector." Which centrality algorithm would you use to find the "Main Source" from which a rumor spread?
  2. The "Intern" PageRank: If an intern is connected to 100 other interns, and the CEO is connected to 10 board members, who has the higher Degree? Who has the higher PageRank?
  3. Visualization: In a graph of "Roads," which intersection has the highest Betweenness? (Hint: Think of a bridge or a tunnel).

In the next lesson, we will look at grouping our data: Community Detection for Contextual Clustering.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn