Batch vs Interactive Workloads

Batch vs Interactive Workloads

Optimize your infrastructure for real-time user chat vs large-scale automated data processing.

Batch vs Interactive Workloads

Not all RAG systems are "Chatbots." Depending on your use case, you might need to process data in real-time or in massive background batches.

Interactive Workloads (Real-Time)

  • Goal: Low Latency.
  • Example: A customer support agent asking a question.
  • Reqs: Direct API access to Chroma, streaming LLM outputs.

Batch Workloads (Background)

  • Goal: High Throughput & Cost Efficiency.
  • Example: Analyzing 10,000 past support tickets once a week to find common issues.
  • Reqs: Worker queues (like Celery or RabbitMQ), separate database instances to avoid slowing down production.

Implementation: The Task Queue

For RAG, use a task queue to handle migrations or large-scale re-indexing.

@celery.task
def ingest_massive_folder(folder_path):
    # Process 10,000 files in the background
    # Update the production index only when finished

Resource Isolation

Never run a massive "Batch Ingestion" (which uses 100% CPU/GPU) on the same machine that is serving "Interactive" user queries. Use separate database replicas or cloud clusters.

MetricInteractiveBatch
LatencyCriticalIrrelevant
CostPay-per-useSpot Instances (Cheapest)
ScalingSpike-basedConstant load

Exercises

  1. Why should you use "Spot Instances" for batch RAG ingestion?
  2. What is a "Message Queue," and how does it help with system stability?
  3. Design a system that handles 1,000 users chatting and a background job transcribing 500 hours of video.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn