Indexing Strategies: HNSW, IVF, and PQ

We have established that Approximate Nearest Neighbor (ANN) is the only way to scale vector search to millions of items. But how do these "proximate" searches actually work under the hood?

In the vector database world, an Index is a data structure designed to navigate high-dimensional space. Unlike a B-Tree, which sorts data linearly, a vector index organizes data based on Proximity.

In this lesson, we explore the "Big Three" of vector indexing: HNSW, IVF, and PQ. Understanding these will help you choose the right configuration for Pinecone, Chroma, or OpenSearch.

1. HNSW: Hierarchical Navigable Small Worlds

HNSW is the industry standard for production vector databases. It is a graph-based index that is incredibly fast and offers high recall.

The Intuition: The Multi-Layer Map

Imagine you are in a city and want to reach a specific house.

Top Layer (Broad): You look at a map of the world and find the right country.
Middle Layer (Specific): You look at a map of the city and find the right neighborhood.
Bottom Layer (Exact): You walk the street looking for the house number.

HNSW builds this hierarchy of graphs. The top layers are sparse (fewer points), allowing you to make "huge jumps" across the vector space. As you move down the layers, the graph becomes denser, allowing you to fine-tune your location.

graph TD
    subgraph Layer_2_Coarse
    L2_A --- L2_B
    end
    subgraph Layer_1_Medium
    L1_A --- L1_B --- L1_C --- L1_D
    end
    subgraph Layer_0_Dense
    L0_A --- L0_B --- L0_C --- L0_D --- L0_E --- L0_F
    end
    L2_A -.-> L1_A
    L1_A -.-> L0_A

Pros:

Incredible Search Speed: Sub-linear O(log n) search time.
High Recall: Very good at finding the "True" nearest neighbor.

Cons:

High Memory Usage: Each vector needs to store many graph connections (pointers).
Slow Index Building: Creating the graph is computationally expensive.

2. IVF: Inverted File Index

IVF is based on Clustering. It works by partitioning the entire vector space into regions (Voronoi cells).

How it works:

Training Phase: The algorithm identifies "centroids" (center points) in the data using K-Means clustering.
Indexing Phase: Every vector is assigned to the nearest centroid.
Search Phase:
- The query vector first finds the nearest centroids (nprobe parameter).
- It only searches the vectors within those specific clusters.

graph TD
    Q[Query Vector] --> C{Nearest Centroid?}
    C -- Cluster 1 --> S1[Search 500 vectors]
    C -- Cluster 23 --> S2[Search 200 vectors]
    style Q fill:#f96

Pros:

Low Memory Footprint: Much lighter than HNSW.
Fast Build Times: Great for frequent re-indexing.

Cons:

Lower Recall: If your query is on the "border" between clusters, IVF might miss the perfect neighbor in the adjacent cluster.
Requires Training: You need a representative sample of data to find the right centroids.

3. PQ: Product Quantization (Compression)

PQ is not a search algorithm like HNSW or IVF; it is a compression technique. It is almost always used in combination with IVF (forming the IVF_PQ index used in Faiss and OpenSearch).

The Problem: RAM is Expensive

A 1536D vector takes ~6KB. 100 Million vectors take 600GB of RAM.

The Solution: The "Look-alike" Table

PQ breaks a large vector into smaller sub-vectors. For each sub-vector, it finds a "representative" ID from a pre-trained codebase. This turns a 1536-float vector into a tiny list of 8-bit integers.

Analogy: Instead of describing an outfit with 1000 words (size, fabric, color, zip type), you say: "It's Outfit #45 in the catalog."

Pros:

Massive Storage Savings: Can compress vectors by 10x to 100x.
Faster Math: Modern CPUs can compare 8-bit integers much faster than 32-bit floats.

Cons:

Information Loss: Similarity scores are no longer perfect; they are "approximations of approximations."

4. The Comparison: HNSW vs. IVF_PQ

Feature	HNSW (Graph)	IVF_PQ (Cluster + Compress)
Memory Usage	Very High	Very Low
Search Speed	Blazing Fast	Fast
Recall (Accuracy)	Excellent (95%+)	Good (85-90%)
Data Size	Small to Medium (< 10M)	Massive Scale (Billions)
Dynamic Updates	Difficult (Graph breaks)	Easier

5. Python Example: Indexing with Faiss

Faiss (Facebook AI Similarity Search) is the library that most vector databases use under the hood. Let's see how easy it is to switch between Flat (Exact) and IVF (Approximate) indexing.

import faiss
import numpy as np

# 1. Dataset setup
d = 128                           # dimension
nb = 100000                       # database size
nq = 10                           # nb of queries
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')

# --- Strategy 1: FLAT (EXACT) ---
index_flat = faiss.IndexFlatL2(d)
index_flat.add(xb)
D, I = index_flat.search(xq, 5) # Search top 5
print(f"Flat Index: Found neighbors for query 1: {I[0]}")

# --- Strategy 2: IVF (APPROXIMATE) ---
nlist = 100 # Number of clusters
quantizer = faiss.IndexFlatL2(d) # How to find the clusters
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist)

# IVF MUST be trained to find centroids
index_ivf.train(xb)
index_ivf.add(xb)

# nprobe = How many clusters to check during search
index_ivf.nprobe = 10 
D, I = index_ivf.search(xq, 5)
print(f"IVF Index : Found neighbors for query 1: {I[0]}")

6. How to Choose Your Indexing Strategy

Developing a Prototype? Use Flat or HNSW (default in Chroma/Pinecone). You want accuracy and ease of use.
Building a Production RAG App (< 5M docs)? Use HNSW. The memory cost is high, but the latency and recall are worth it for professional AI results.
Building a Global Search Engine (> 100M items)? Use IVF_PQ. You simply cannot afford the RAM for HNSW at this scale. You accept slightly lower accuracy for significantly lower infrastructure costs.

Summary and Key Takeaways

Indexing is the "Black Magic" that makes vector databases possible.

HNSW uses a hierarchy of graphs to navigate. It is memory-heavy but extremely fast and accurate.
IVF uses clusters to partition space. It is memory-efficient and easy to train.
PQ compresses vectors into catalogs of IDs, enabling billion-scale search in RAM.

In the next lesson, we will look at the Recall vs Latency trade-offs in depth, learning how to "Tune" parameters like efConstruction and nprobe to optimize your production performance.

Exercise: The Architect's Dilemma

Your company is launching a new "AI Shopping Assistant."

Total Products: 50,000,000
Query volume: 1,000 queries per second.
Budget: Limited.

Would you choose HNSW or IVF_PQ? Why?
If you choose IVF_PQ, and users complain that search results are "vaguely wrong," which parameter would you increase? (nlist, nprobe, or the PQ code size?)
If you move your database to a cloud provider like Pinecone, and your monthly bill is $5,000, which indexing strategy would help you lower that bill?

This is the type of decision-making that separates an "AI Tinkerer" from a "Production AI Engineer."

Indexing Strategies: HNSW, IVF, and the Art of Information Geometry

Indexing Strategies: HNSW, IVF, and PQ

1. HNSW: Hierarchical Navigable Small Worlds

The Intuition: The Multi-Layer Map

2. IVF: Inverted File Index

How it works:

3. PQ: Product Quantization (Compression)

The Problem: RAM is Expensive

The Solution: The "Look-alike" Table

4. The Comparison: HNSW vs. IVF_PQ

5. Python Example: Indexing with Faiss

6. How to Choose Your Indexing Strategy

Summary and Key Takeaways

Exercise: The Architect's Dilemma

Subscribe to our newsletter