
Index Size Optimization
Techniques for shrinking your vector database and reducing RAM usage without sacrificing retrieval quality.
Index Size Optimization
Large vector databases are expensive. Memory (RAM) for 10 million vectors can cost thousands of dollars a month. Index optimization is about being "frugal" with your vectors.
Technique 1: Product Quantization (PQ)
PQ compresses large vectors into smaller, "compressed" representations.
- Pro: Massive reduction in RAM usage (up to 95%).
- Con: Small loss in retrieval accuracy.
Technique 2: Metadata Stripping
Don't store huge JSON objects in every vector's metadata.
- Solution: Store only the
doc_idin the vector DB and keep the heavy metadata (full text, dates, authors) in a standard relational DB (PostgreSQL).
Technique 3: Scalar Quantization (SQ)
Converts 32-bit floating point numbers (floats) into 8-bit integers (int8).
- Pro: 4x reduction in memory footprint.
- Con: Almost no accuracy loss for most modern embedding models.
Technique 4: Smart Deduplication
If two chunks have 90% overlapping text, only index the larger one. Using MinHash or SimHash during ingestion can identify these redundant chunks before they ever hit the database.
Storage vs. RAM (Disk-based HNSW)
Some databases (like Milvus or Weaviate) allow you to store the index on disk (SSD) rather than RAM.
- Result: Much cheaper, but 5-10x higher latency. Best for "Cold Storage" RAG (e.g., archival searches).
Comparison Matrix
| Method | Size Reduction | Performance Impact |
|---|---|---|
| int8 Quantization | 4x | Very Low |
| Product Quantization | 20x | Medium |
| Selective Indexing | Variable | None |
Exercises
- Calculate the RAM needed for 1 Million vectors (1024 dims) using float32.
- Now calculate it for int8. How much money did you save?
- Why should you avoid indexing "Stop Words" (like 'and', 'the', 'is') in your metadata?