
Persistence and Scaling Considerations
Preparing your vector database for production by understanding storage backends and scaling limits.
Persistence and Scaling Considerations
Moving from a prototype (which runs in your app's RAM) to a production system (which survives restarts and handles heavy load) requires a deeper understanding of how Chroma stores data.
Persistent Storage in Chroma
By default, Chroma can persist data to your local filesystem.
import chromadb
client = chromadb.PersistentClient(path="./my_chroma_db")
When you call add() or upsert(), Chroma writes the data to the specified path. If your server crashes, the data remains safely on disk.
Scaling to Production
Vertically Scaling
The simplest way to scale Chroma is to give it more RAM and CPU. Since HNSW (the search algorithm) is primarily memory-resident for speed, the amount of vectors you can search is directly proportional to your server's RAM.
Compute vs. Storage
- Indexing (adding data) is CPU intensive.
- Searching is RAM intensive.
- Persistence is Disk I/O intensive.
Client/Server Deployment
For production, don't run Chroma "inside" your Python app as a library. Use the Chroma Server (available as a Docker image).
docker run -p 8000:8000 chromadb/chroma
This allows your application to remain "stateless"—multiple app servers can connect to a single central Chroma server.
Backup and Recovery
Since Chroma persists to a directory, backing up your database is as simple as:
- Stopping the Chroma service (to ensure data consistency).
- Creating a zip/snapshot of the data directory.
- Uploading the snapshot to S3 or similar long-term storage.
Limitations to Watch For
- Memory Limits: If your index exceeds available RAM, search performance will drop drastically as it begins "swapping" to disk.
- Single-Node Writes: Standard Chroma is a single-writer system. If you have hundreds of writes per second, you might see contention issues.
Exercises
- Set up a Chroma server using Docker.
- What is the average memory footprint of 100,000 vectors with 1536 dimensions?
- How would you "Horizontally Scale" a RAG system if your vector database becomes the bottleneck?