Compute Hardware: GPUs, TPUs, and Edge

Compute Hardware: GPUs, TPUs, and Edge

Choosing the right silicon. When to pay for A100s, when to use TPUs, and how to quantize models for mobile deployment.

The Hardware Menu

Google Cloud offers a candy store of chips.

  • CPU: General purpose. Slow for Math.
  • GPU (NVIDIA): Great for Math. Versatile.
  • TPU (Google): Matrix Math Specialist. Extreme speed for XLA-compiled workloads.

The exam asks you to balance Cost vs Performance.


1. GPU Selection Guide

GPU TypeUse CaseExam Keyword
A100 (80GB)Massive LLMs, Foundation Models"Highest Performance", "Large VRAM needed"
V100Standard High-Performance Training"Fast Training"
T4Inference (Serving)"Inference", "Cost Effective", "Small"
K80Legacy (Avoid)"Old", "Slow"

Rule: Use A100 for training massive models. Use T4 for serving web traffic (cheap/good enough).


2. Tensor Processing Units (TPUs)

TPUs are Google's custom ASICs. They are faster and cheaper than GPUs if your model fits.

  • Best For: TensorFlow / JAX models. Massive Matrix Multiplication (Transformers, CNNs).
  • Worst For: Custom CUDA ops, Models with lots of branching logic (If/Else).
  • Topology: TPUs are connected in a high-speed "Pod". You don't just get one; you get a "slice" of a pod.

Exam Tip: If the question mentions "Training time is too slow" or "Cost is too high" and the model is purely TensorFlow/JAX -> Switch to TPU.


3. Edge Deployment (TensorFlow Lite)

Sometimes you can't run on the cloud.

  • Latency: An autonomous car can't wait for a signal to go to the cloud.
  • Privacy: Health data stays on the phone.

Quantization: To fit a model on a phone, we convert 32-bit floats to 8-bit integers.

  • This reduces size by 4x.
  • It slightly reduces accuracy.
  • Post-Training Quantization: Easy.
  • Quantization-Aware Training (QAT): Harder, but better accuracy.

4. Visualizing the Trade-off

graph TD
    Problem{Constraint?}
    
    Problem -->|Speed/Cost| Cloud{Cloud Compute}
    Problem -->|Offline/Privacy| Edge{Edge Device}
    
    Cloud -->|Versatility| GPU[NVIDIA GPU]
    Cloud -->|Max Throughput| TPU[Google TPU]
    
    Edge -->|Mobile| TFLite[TF Lite (Android/iOS)]
    Edge -->|Browser| TFJS[TensorFlow.js]
    Edge -->|Embedded| Coral[Coral TPU]

5. Summary

  • T4 is the king of Inference (Serving).
  • A100/TPU are the kings of Training.
  • Quantization shrinks models for Edge deployment but risks accuracy loss.

In the next lesson, we take our trained model and put it on the internet. Model Serving.


Knowledge Check

?Knowledge Check

You have trained a massive image segmentation model. You need to deploy it to a fleet of drones for real-time forest fire detection. The drones have limited battery and connectivity. What is the correct deployment strategy?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn