
Chroma Architecture Overview
Understand the internals of Chroma, from storage engines to embedding functions.
Chroma Architecture Overview
Chroma is designed to be the database layer for the AI stack. Unlike general-purpose databases that added "vector support", Chroma was built from the ground up for vectors.
The Three Core Components
1. The Storage Engine
Chroma uses DuckDB for metadata management and ClickHouse or local storage for the vectors themselves. This allows for fast analytical queries alongside vector search.
2. The Embedding Function
One unique feature of Chroma is that it can "own" the embedding logic. You can pass it an EmbeddingFunction so you don't have to manually calculate vectors before searching.
import chromadb.utils.embedding_functions as ef
openai_ef = ef.OpenAIEmbeddingFunction(api_key="your_key")
collection = client.create_collection("docs", embedding_function=openai_ef)
# Now you can just add text directly
collection.add(documents=["hello world"], ids=["1"])
3. The Retrieval API
The API is built for high-performance retrieval of n nearest neighbors. It returns the Document, Metadata, and Vector (optional) in a single unified response.
Deployment Modes
- Ephemeral (In-Memory): Great for testing and scripts. Data is lost when the script ends.
- Persistent (On-Disk): Saves data to a local directory.
- Client/Server: Run Chroma in a Docker container and connect to it over HTTP.
HNSW: The Search Heart
Chroma uses HNSW (Hierarchical Navigable Small World) for its indexing. Imagine a multi-layered social network where you can jump across the "world" in a few hops. This algorithm allows searching millions of records in milliseconds.
Why "AI-Native"?
Chroma simplifies the RAG developer experience by:
- Handling ID generation automatically (optional).
- Managing document-to-vector mapping.
- Providing easy-to-use Python and Javascript clients.
Exercises
- Initialize a Chroma client in "Persistent" mode.
- Add 5 documents and shut down the script.
- Re-start the script and verify that the documents are still there.
- How does Chroma handle a collection that already exists?