AI Powered Learning Portal

Online vs Batch: Choosing the Pattern

February 10, 2026·Professional•Engineering•Certifications

Online vs Batch: Choosing the Pattern

The Architecture Decision. When to use HTTP prediction vs batch jobs, and how to handle cost/latency trade-offs.

The Latency Equation

The first question on Serving is: "Does the user need the answer NOW?"

1. Online Prediction (Synchronous)

Experience: User clicks "Search". 100ms later, results appear.
Tech: Vertex AI Endpoints (REST/gRPC).
Cost: You pay for the node 24/7 (unless scaled to zero).
Format: JSON/Payload.

2. Batch Prediction (Asynchronous)

Experience: Marketing team wants to "Score all 10 million users for Churn Risk" every Sunday night.
Tech: Vertex AI Batch Prediction Job.
Cost: You pay only for the minutes the job runs.
Format: GCS Files (CSV/JSONL) or BigQuery Tables.

3. Streaming Prediction (The Hybrid)

Experience: IoT Sensor sends data via Pub/Sub. We need to detect anomaly.
Tech: Dataflow + Vertex AI Endpoint.
Pattern: Dataflow reads Pub/Sub, batches records (micro-batch), calls Vertex Endpoint, writes result to BigQuery.

Knowledge Check

Error: Quiz options are missing or invalid.

Previous LessonDistributed Architectures: Parameter Server vs All-Reduce

Next LessonModel Serving: Vertex AI Prediction

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn