Semi-Structured Retrieval: Blending SQL and Graph

Graphs are excellent at "Relationships," but they are often mediocre at "Aggregates." If you ask a graph "Who is the manager of Project X?", it's fast. If you ask it "What was the average budget of the last 1,000 projects?", it's slow compared to SQL. A professional AI system doesn't choose between SQL and Graph; it Blends them.

In this lesson, we will look at Semi-Structured Retrieval Architectures. We will learn how to build an AI agent that generates Both Cypher and SQL in a single execution loop. We will see how to "Join" a Graph node in Neo4j with a Data Row in Snowflake or BigQuery to provide answers that are both logically deep and mathematically precise.

1. The "SQL-Graph" Division of Labor

The Graph: Handles the "Who/Where/How" (Topology).
The SQL: Handles the "How Much/When/Which" (Quantities).

The Workflow:

User: "What is the total salary of everyone reporting to Sudeep?"
Step 1 (Graph): AI finds the list of employee IDs reporting to Sudeep: [101, 102, 103].
Step 2 (SQL): AI runs: SELECT SUM(salary) FROM employees WHERE id IN (101, 102, 103).
Step 3 (Synth): AI answers: "Total is $300k."

2. The "Unified ID" Pattern

To make this work, the Node ID in your Graph must be the Primary Key in your SQL table.

Node: :Person {id: 'emp_2291', name: 'Jane'}
Table Row: id: 'emp_2291', salary: 120000, start_date: '2022-01-01'

This creates a "Digital Twin" where the Structural Essence of Jane is in the Graph and the Quantitative Details are in the relational database.

3. The "Hybrid Reasoner" Prompt

You must teach your AI agent how to "Split" its tasks.

INSTRUCTION:
If the question involves 'Management Structure' or 'Hierarchy', use GRAPH.
If the question involves 'Sums', 'Averages', or 'Sorting by Price', use SQL.
Always use the 'Employee_ID' to bridge the two results.

graph TD
    User --> Q[Question: Avg Budget of Sudeep's Team]
    Q --> G[Graph Agent]
    G -->|Find Team IDs| SQL[SQL Agent]
    SQL -->|AVG(Budget)| Synth[LLM Synthesis]
    Synth --> Ans[Answer: $450k]
    
    style G fill:#4285F4,color:#fff
    style SQL fill:#f4b400,color:#fff

4. Implementation: A Hybrid Retrieval Chain

def hybrid_query(user_query):
    # 1. Get the path from the Graph
    team_members = neo4j.run("MATCH (p:Person {name:'Sudeep'})-[:MANAGES]->(m) RETURN m.id")
    ids = [m['m.id'] for m in team_members]
    
    # 2. Get the numbers from SQL
    query = f"SELECT AVG(salary) FROM payroll WHERE emp_id IN {tuple(ids)}"
    avg_salary = postgres.run(query)
    
    return f"The average salary of Sudeep's team is ${avg_salary}."

5. Summary and Exercises

Hybrid retrieval is the "Industrial Strength" version of RAG.

SQL handles the heavy aggregation and math.
Graph handles the complex, multi-hop discovery.
Shared Identifiers are the "Gluing" agents for these two systems.
Intelligent Routing: The AI determines which engine to use based on the user's verbs (Find vs Sum).

Exercises

System Design: You are building an app for "Amazon." Which data belongs in a Graph (e.g., 'People who bought X also bought Y') and which belongs in SQL (e.g., 'Total Revenue for Prime Day')?
The "Bridge" Task: If you lose the "ID Mapping" between the two databases, what happens to your AI's reasoning ability?
Visualization: Draw a circle (Graph) and a square (SQL). Draw a line between them and label it "ID:123".

In the next lesson, we will look at autonomous graphs: Emergent Graph Reasoning: Self-Maintained agents.