The Hardware Menu

Google Cloud offers a candy store of chips.

CPU: General purpose. Slow for Math.
GPU (NVIDIA): Great for Math. Versatile.
TPU (Google): Matrix Math Specialist. Extreme speed for XLA-compiled workloads.

The exam asks you to balance Cost vs Performance.

1. GPU Selection Guide

GPU Type	Use Case	Exam Keyword
A100 (80GB)	Massive LLMs, Foundation Models	"Highest Performance", "Large VRAM needed"
V100	Standard High-Performance Training	"Fast Training"
T4	Inference (Serving)	"Inference", "Cost Effective", "Small"
K80	Legacy (Avoid)	"Old", "Slow"

Rule: Use A100 for training massive models. Use T4 for serving web traffic (cheap/good enough).

2. Tensor Processing Units (TPUs)

TPUs are Google's custom ASICs. They are faster and cheaper than GPUs if your model fits.

Best For: TensorFlow / JAX models. Massive Matrix Multiplication (Transformers, CNNs).
Worst For: Custom CUDA ops, Models with lots of branching logic (If/Else).
Topology: TPUs are connected in a high-speed "Pod". You don't just get one; you get a "slice" of a pod.

Exam Tip: If the question mentions "Training time is too slow" or "Cost is too high" and the model is purely TensorFlow/JAX -> Switch to TPU.

3. Edge Deployment (TensorFlow Lite)

Sometimes you can't run on the cloud.

Latency: An autonomous car can't wait for a signal to go to the cloud.
Privacy: Health data stays on the phone.

Quantization: To fit a model on a phone, we convert 32-bit floats to 8-bit integers.

This reduces size by 4x.
It slightly reduces accuracy.
Post-Training Quantization: Easy.
Quantization-Aware Training (QAT): Harder, but better accuracy.

4. Visualizing the Trade-off

graph TD
    Problem{Constraint?}
    
    Problem -->|Speed/Cost| Cloud{Cloud Compute}
    Problem -->|Offline/Privacy| Edge{Edge Device}
    
    Cloud -->|Versatility| GPU[NVIDIA GPU]
    Cloud -->|Max Throughput| TPU[Google TPU]
    
    Edge -->|Mobile| TFLite[TF Lite (Android/iOS)]
    Edge -->|Browser| TFJS[TensorFlow.js]
    Edge -->|Embedded| Coral[Coral TPU]

5. Summary

T4 is the king of Inference (Serving).
A100/TPU are the kings of Training.
Quantization shrinks models for Edge deployment but risks accuracy loss.

In the next lesson, we take our trained model and put it on the internet. Model Serving.

Knowledge Check

Error: Quiz options are missing or invalid.

Compute Hardware: GPUs, TPUs, and Edge