
Replication and Data Consistency
Learn how to replicate your vector data for reliability. Master the trade-offs between 'Strong Consistency' and 'Eventual Consistency'.
Replication and Data Consistency
If a server holding your only copy of a vector shard crashes, your data is gone. To prevent this, we use Replication. Every shard is copied to at least two other machines.
However, replication introduces a new problem: Consistency. If you update a vector on Machine 1, how long until Machine 2 and 3 see that update?
1. Primary-Replica Architecture
Most vector databases use a "Leader/Follower" model:
- Primary (Leader): Handles all Writes (Ingestion, Deletion).
- Replica (Follower): Handles only Reads (Queries). The Primary asynchronously pushes updates to the Replicas.
2. Strong vs. Eventual Consistency
- Strong Consistency: The database doesn't confirm the "Write" is finished until every replica has confirmed.
- Pros: You never get an "Old" result.
- Cons: High latency. If one replica is slow, your whole app waits.
- Eventual Consistency: The database confirms the "Write" immediately after the Primary finishes. Replicas catch up a few milliseconds later.
- Pros: Extremely fast ingestion.
- Cons: A user might update their document and search for it 5ms later, but get the "Old" version from a replica that hasn't synced yet.
3. High Availability (HA) through Replication
Replication isn't just for backups; it's for Search Throughput.
If a single shard can handle 50 QPS, and you add 2 replicas, your cluster can now handle 150 QPS. Distributed vector databases like Pinecone scale their "Read Performance" by automatically adding replicas when traffic spikes.
4. Visualizing the Sync Loop
sequenceDiagram
participant U as User
participant P as Primary Shard
participant R as Replica Shard
U->>P: Upsert Vector_A
P-->>U: Confirmed (Quick!)
note right of P: Replication Lag (ms)
P->>R: Async Sync Vector_A
U->>R: Query Vector_A
R-->>U: Return result
5. Summary and Key Takeaways
- Replicate for Reliability: Never run a production app on a single-copy shard.
- Replicate for Speed: More replicas = more searches per second.
- Consistency Trade-off: In AI applications, "Eventual Consistency" is almost always the better choice because millisecond-latency ingestion is crucial for real-time feedback.
- Quorum: Advanced databases use a "Quorum" (Majority) to decide if a write is successful.
In the next lesson, we’ll see how to package this into a High Availability architecture.