Online vs Batch: Choosing the Pattern
·ProfessionalEngineeringCertifications

Online vs Batch: Choosing the Pattern

The Architecture Decision. When to use HTTP prediction vs batch jobs, and how to handle cost/latency trade-offs.

The Latency Equation

The first question on Serving is: "Does the user need the answer NOW?"


1. Online Prediction (Synchronous)

  • Experience: User clicks "Search". 100ms later, results appear.
  • Tech: Vertex AI Endpoints (REST/gRPC).
  • Cost: You pay for the node 24/7 (unless scaled to zero).
  • Format: JSON/Payload.

2. Batch Prediction (Asynchronous)

  • Experience: Marketing team wants to "Score all 10 million users for Churn Risk" every Sunday night.
  • Tech: Vertex AI Batch Prediction Job.
  • Cost: You pay only for the minutes the job runs.
  • Format: GCS Files (CSV/JSONL) or BigQuery Tables.

3. Streaming Prediction (The Hybrid)

  • Experience: IoT Sensor sends data via Pub/Sub. We need to detect anomaly.
  • Tech: Dataflow + Vertex AI Endpoint.
  • Pattern: Dataflow reads Pub/Sub, batches records (micro-batch), calls Vertex Endpoint, writes result to BigQuery.

Knowledge Check

?Knowledge Check

A retail company wants to recommend products to users. They have 100 million users. The recommendations only need to be updated once a day based on yesterday's browsing history. Which serving pattern minimizes cost?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn