Index Size Optimization

Index Size Optimization

Techniques for shrinking your vector database and reducing RAM usage without sacrificing retrieval quality.

Index Size Optimization

Large vector databases are expensive. Memory (RAM) for 10 million vectors can cost thousands of dollars a month. Index optimization is about being "frugal" with your vectors.

Technique 1: Product Quantization (PQ)

PQ compresses large vectors into smaller, "compressed" representations.

  • Pro: Massive reduction in RAM usage (up to 95%).
  • Con: Small loss in retrieval accuracy.

Technique 2: Metadata Stripping

Don't store huge JSON objects in every vector's metadata.

  • Solution: Store only the doc_id in the vector DB and keep the heavy metadata (full text, dates, authors) in a standard relational DB (PostgreSQL).

Technique 3: Scalar Quantization (SQ)

Converts 32-bit floating point numbers (floats) into 8-bit integers (int8).

  • Pro: 4x reduction in memory footprint.
  • Con: Almost no accuracy loss for most modern embedding models.

Technique 4: Smart Deduplication

If two chunks have 90% overlapping text, only index the larger one. Using MinHash or SimHash during ingestion can identify these redundant chunks before they ever hit the database.

Storage vs. RAM (Disk-based HNSW)

Some databases (like Milvus or Weaviate) allow you to store the index on disk (SSD) rather than RAM.

  • Result: Much cheaper, but 5-10x higher latency. Best for "Cold Storage" RAG (e.g., archival searches).

Comparison Matrix

MethodSize ReductionPerformance Impact
int8 Quantization4xVery Low
Product Quantization20xMedium
Selective IndexingVariableNone

Exercises

  1. Calculate the RAM needed for 1 Million vectors (1024 dims) using float32.
  2. Now calculate it for int8. How much money did you save?
  3. Why should you avoid indexing "Stop Words" (like 'and', 'the', 'is') in your metadata?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn