
Compute Hardware: GPUs, TPUs, and Edge
Choosing the right silicon. When to pay for A100s, when to use TPUs, and how to quantize models for mobile deployment.
The Hardware Menu
Google Cloud offers a candy store of chips.
- CPU: General purpose. Slow for Math.
- GPU (NVIDIA): Great for Math. Versatile.
- TPU (Google): Matrix Math Specialist. Extreme speed for XLA-compiled workloads.
The exam asks you to balance Cost vs Performance.
1. GPU Selection Guide
| GPU Type | Use Case | Exam Keyword |
|---|---|---|
| A100 (80GB) | Massive LLMs, Foundation Models | "Highest Performance", "Large VRAM needed" |
| V100 | Standard High-Performance Training | "Fast Training" |
| T4 | Inference (Serving) | "Inference", "Cost Effective", "Small" |
| K80 | Legacy (Avoid) | "Old", "Slow" |
Rule: Use A100 for training massive models. Use T4 for serving web traffic (cheap/good enough).
2. Tensor Processing Units (TPUs)
TPUs are Google's custom ASICs. They are faster and cheaper than GPUs if your model fits.
- Best For: TensorFlow / JAX models. Massive Matrix Multiplication (Transformers, CNNs).
- Worst For: Custom CUDA ops, Models with lots of branching logic (If/Else).
- Topology: TPUs are connected in a high-speed "Pod". You don't just get one; you get a "slice" of a pod.
Exam Tip: If the question mentions "Training time is too slow" or "Cost is too high" and the model is purely TensorFlow/JAX -> Switch to TPU.
3. Edge Deployment (TensorFlow Lite)
Sometimes you can't run on the cloud.
- Latency: An autonomous car can't wait for a signal to go to the cloud.
- Privacy: Health data stays on the phone.
Quantization: To fit a model on a phone, we convert 32-bit floats to 8-bit integers.
- This reduces size by 4x.
- It slightly reduces accuracy.
- Post-Training Quantization: Easy.
- Quantization-Aware Training (QAT): Harder, but better accuracy.
4. Visualizing the Trade-off
graph TD
Problem{Constraint?}
Problem -->|Speed/Cost| Cloud{Cloud Compute}
Problem -->|Offline/Privacy| Edge{Edge Device}
Cloud -->|Versatility| GPU[NVIDIA GPU]
Cloud -->|Max Throughput| TPU[Google TPU]
Edge -->|Mobile| TFLite[TF Lite (Android/iOS)]
Edge -->|Browser| TFJS[TensorFlow.js]
Edge -->|Embedded| Coral[Coral TPU]
5. Summary
- T4 is the king of Inference (Serving).
- A100/TPU are the kings of Training.
- Quantization shrinks models for Edge deployment but risks accuracy loss.
In the next lesson, we take our trained model and put it on the internet. Model Serving.
Knowledge Check
Error: Quiz options are missing or invalid.