Index Size Optimization

Large vector databases are expensive. Memory (RAM) for 10 million vectors can cost thousands of dollars a month. Index optimization is about being "frugal" with your vectors.

Technique 1: Product Quantization (PQ)

PQ compresses large vectors into smaller, "compressed" representations.

Pro: Massive reduction in RAM usage (up to 95%).
Con: Small loss in retrieval accuracy.

Technique 2: Metadata Stripping

Don't store huge JSON objects in every vector's metadata.

Solution: Store only the doc_id in the vector DB and keep the heavy metadata (full text, dates, authors) in a standard relational DB (PostgreSQL).

Technique 3: Scalar Quantization (SQ)

Converts 32-bit floating point numbers (floats) into 8-bit integers (int8).

Pro: 4x reduction in memory footprint.
Con: Almost no accuracy loss for most modern embedding models.

Technique 4: Smart Deduplication

If two chunks have 90% overlapping text, only index the larger one. Using MinHash or SimHash during ingestion can identify these redundant chunks before they ever hit the database.

Storage vs. RAM (Disk-based HNSW)

Some databases (like Milvus or Weaviate) allow you to store the index on disk (SSD) rather than RAM.

Result: Much cheaper, but 5-10x higher latency. Best for "Cold Storage" RAG (e.g., archival searches).

Comparison Matrix

Method	Size Reduction	Performance Impact
int8 Quantization	4x	Very Low
Product Quantization	20x	Medium
Selective Indexing	Variable	None

Exercises

Calculate the RAM needed for 1 Million vectors (1024 dims) using float32.
Now calculate it for int8. How much money did you save?
Why should you avoid indexing "Stop Words" (like 'and', 'the', 'is') in your metadata?