Replication and Data Consistency

If a server holding your only copy of a vector shard crashes, your data is gone. To prevent this, we use Replication. Every shard is copied to at least two other machines.

However, replication introduces a new problem: Consistency. If you update a vector on Machine 1, how long until Machine 2 and 3 see that update?

1. Primary-Replica Architecture

Most vector databases use a "Leader/Follower" model:

Primary (Leader): Handles all Writes (Ingestion, Deletion).
Replica (Follower): Handles only Reads (Queries). The Primary asynchronously pushes updates to the Replicas.

2. Strong vs. Eventual Consistency

Strong Consistency: The database doesn't confirm the "Write" is finished until every replica has confirmed.
- Pros: You never get an "Old" result.
- Cons: High latency. If one replica is slow, your whole app waits.
Eventual Consistency: The database confirms the "Write" immediately after the Primary finishes. Replicas catch up a few milliseconds later.
- Pros: Extremely fast ingestion.
- Cons: A user might update their document and search for it 5ms later, but get the "Old" version from a replica that hasn't synced yet.

3. High Availability (HA) through Replication

Replication isn't just for backups; it's for Search Throughput.

If a single shard can handle 50 QPS, and you add 2 replicas, your cluster can now handle 150 QPS. Distributed vector databases like Pinecone scale their "Read Performance" by automatically adding replicas when traffic spikes.

4. Visualizing the Sync Loop

sequenceDiagram
    participant U as User
    participant P as Primary Shard
    participant R as Replica Shard
    
    U->>P: Upsert Vector_A
    P-->>U: Confirmed (Quick!)
    note right of P: Replication Lag (ms)
    P->>R: Async Sync Vector_A
    U->>R: Query Vector_A
    R-->>U: Return result

5. Summary and Key Takeaways

Replicate for Reliability: Never run a production app on a single-copy shard.
Replicate for Speed: More replicas = more searches per second.
Consistency Trade-off: In AI applications, "Eventual Consistency" is almost always the better choice because millisecond-latency ingestion is crucial for real-time feedback.
Quorum: Advanced databases use a "Quorum" (Majority) to decide if a write is successful.

In the next lesson, we’ll see how to package this into a High Availability architecture.

Replication and Data Consistency

Replication and Data Consistency

1. Primary-Replica Architecture

2. Strong vs. Eventual Consistency

3. High Availability (HA) through Replication

4. Visualizing the Sync Loop

5. Summary and Key Takeaways

Congratulations on completing Module 15 Lesson 3! You are now move data reliably.

Subscribe to our newsletter