Combining Vector Search with Graph Queries: The Hybrid Path

In the first module, we talked about "Vector RAG" and "Graph RAG" as two different things. In this lesson, we stop the war and start the wedding. In a production Graph RAG system, you don't choose one; you use both in a single, atomic operation.

We will learn how to use a Vector Search to find the "Entrance Point" to the graph and then immediately follow it with a Graph Traversal to gather context. This is the "Vector-Entry, Graph-Walk" pattern that powers the most advanced AI assistants on the market today.

1. The "Semantic Entrance" Pattern

Why do we need vectors in a graph database?

User Query: "Tell me about the guy who leads the project involving new energy batteries."

The Knowledge Graph doesn't have a node named "GUY WHO LEADS...". It has a node named Sudeep and a node named Lithium-Ion Initiative. The graph engine can't find these with a standard string match.

The Hybrid Step:

Perform a Vector Search for "new energy batteries" inside the graph database.
The search returns the Lithium-Ion Initiative node.
Graph Traversal starts from that node to find the relationship [:LEADS] and the connected Person node.

2. Integrated Hybrid Queries in Cypher

Modern databases (like Neo4j 5.x) allow you to do both in one query.

// 1. SEMANTIC SEARCH (Vector)
CALL db.index.vector.queryNodes('project_embeddings', 1, $queryVector) 
YIELD node as start_node, score

// 2. RELATIONSHIP EXPANSION (Graph)
MATCH (start_node)-[:CONTRIBUTED_BY]->(contributor:Person)
OPTIONAL MATCH (start_node)-[:DEPENDS_ON]->(dependency)

// 3. RESULT
RETURN start_node.name, contributor.name, collect(dependency.name)

The Beauty: The logic is atomic. If the vector search finds a "Better" node, the graph walk automatically follows it. You don't have to write complex application code to link two different databases.

3. Filtering Vectors with Graph Topology

You can also do the reverse: use the graph to narrow down the vector search.

"Find me the most relevant document (Vector), but ONLY if it was written by someone in the Engineering department (Graph)."

MATCH (p:Person)-[:WORKS_IN]->(d:Department {name: 'Engineering'})
MATCH (p)-[:AUTHORED]->(doc:Document)
// Perform vector search ONLY on those documents
WITH doc, db.vector.similarity(doc.embedding, $queryVector) as score
ORDER BY score DESC
LIMIT 5
RETURN doc.content

graph TD
    User[Query Vector] -->|Similarity| V[(Vector Index)]
    V -->|Top Node| Start[Project Node]
    Start -->|Traversal| N1[Team]
    Start -->|Traversal| N2[Budget]
    Start -->|Traversal| N3[Risks]
    N1 & N2 & N3 -->|Context| LLM[LLM Synthesizer]
    
    style V fill:#f4b400,color:#fff
    style LLM fill:#34A853,color:#fff

4. Why this is the "Gold Standard" for RAG

High Recall: You don't need exact keyword matches to find the right part of the graph.
Low Latency: Passing data from a vector index to a graph traversal inside the database is much faster than doing it in Python.
Logical Grounding: The answer isn't just "based on similarity"—it's based on a hard-coded relationship in the graph.

5. Implementation: Building a Hybrid Query in Python

Let's look at how we prepare the vector and run the query.

import openai

# 1. Convert user question to a vector
# (Using OpenAI text-embedding-3-small)
response = openai.embeddings.create(input="new energy", model="text-embedding-3-small")
q_vector = response.data[0].embedding

# 2. Run the Hybrid Cypher (Simplified)
# Note: $q_vector is passed as a parameter
cypher = """
CALL db.index.vector.queryNodes('embeddings', 3, $q_vector)
YIELD node AS start_node
MATCH (start_node)-[:RELATED_TO]->(other)
RETURN start_node.name, collect(other.name) as connections
"""

# result = db.execute(cypher, params={"q_vector": q_vector})

6. Summary and Exercises

Hybrid retrieval is where the "Fuzziness" of AI meets the "Rigidity" of Data.

Vector Indices are for finding the Entry Point.
Graph Traversals are for gathering the Evidence.
Integrated Queries reduce network latency and simplify code.
Context is richer because it includes both the "Similar" and the "Connected."

Exercises

Hybrid Flow: Draw the flow for the question: "What is the budget of the project that mentioned 'Hydrogen' in its abstract?". Which part is Vector? Which part is Graph?
Constraint Test: How would you add a constraint to the hybrid query to only show results that were updated in 2024?
The "Entrance" Problem: If your vector search for "Project Apple" returns 5 nodes, do you start a graph walk from all 5, or just the top 1? What are the tradeoffs?

In the next lesson, we will look at how to pick the "Best" results from these complex queries: Ranking and Relevance in Graph Retrieval.