Persistence and Scaling Considerations

Moving from a prototype (which runs in your app's RAM) to a production system (which survives restarts and handles heavy load) requires a deeper understanding of how Chroma stores data.

Persistent Storage in Chroma

By default, Chroma can persist data to your local filesystem.

import chromadb
client = chromadb.PersistentClient(path="./my_chroma_db")

When you call add() or upsert(), Chroma writes the data to the specified path. If your server crashes, the data remains safely on disk.

Scaling to Production

Vertically Scaling

The simplest way to scale Chroma is to give it more RAM and CPU. Since HNSW (the search algorithm) is primarily memory-resident for speed, the amount of vectors you can search is directly proportional to your server's RAM.

Compute vs. Storage

Indexing (adding data) is CPU intensive.
Searching is RAM intensive.
Persistence is Disk I/O intensive.

Client/Server Deployment

For production, don't run Chroma "inside" your Python app as a library. Use the Chroma Server (available as a Docker image).

docker run -p 8000:8000 chromadb/chroma

This allows your application to remain "stateless"—multiple app servers can connect to a single central Chroma server.

Backup and Recovery

Since Chroma persists to a directory, backing up your database is as simple as:

Stopping the Chroma service (to ensure data consistency).
Creating a zip/snapshot of the data directory.
Uploading the snapshot to S3 or similar long-term storage.

Limitations to Watch For

Memory Limits: If your index exceeds available RAM, search performance will drop drastically as it begins "swapping" to disk.
Single-Node Writes: Standard Chroma is a single-writer system. If you have hundreds of writes per second, you might see contention issues.

Exercises

Set up a Chroma server using Docker.
What is the average memory footprint of 100,000 vectors with 1536 dimensions?
How would you "Horizontally Scale" a RAG system if your vector database becomes the bottleneck?