Compute Hardware: GPUs, TPUs, and Edge

Compute Hardware: GPUs, TPUs, and Edge

Choosing the right silicon. When to pay for A100s, when to use TPUs, and how to quantize models for mobile deployment.

The Hardware Menu

Google Cloud offers a candy store of chips.

  • CPU: General purpose. Slow for Math.
  • GPU (NVIDIA): Great for Math. Versatile.
  • TPU (Google): Matrix Math Specialist. Extreme speed for XLA-compiled workloads.

The exam asks you to balance Cost vs Performance.


1. GPU Selection Guide

GPU TypeUse CaseExam Keyword
A100 (80GB)Massive LLMs, Foundation Models"Highest Performance", "Large VRAM needed"
V100Standard High-Performance Training"Fast Training"
T4Inference (Serving)"Inference", "Cost Effective", "Small"
K80Legacy (Avoid)"Old", "Slow"

Rule: Use A100 for training massive models. Use T4 for serving web traffic (cheap/good enough).


2. Tensor Processing Units (TPUs)

TPUs are Google's custom ASICs. They are faster and cheaper than GPUs if your model fits.

  • Best For: TensorFlow / JAX models. Massive Matrix Multiplication (Transformers, CNNs).
  • Worst For: Custom CUDA ops, Models with lots of branching logic (If/Else).
  • Topology: TPUs are connected in a high-speed "Pod". You don't just get one; you get a "slice" of a pod.

Exam Tip: If the question mentions "Training time is too slow" or "Cost is too high" and the model is purely TensorFlow/JAX -> Switch to TPU.


3. Edge Deployment (TensorFlow Lite)

Sometimes you can't run on the cloud.

  • Latency: An autonomous car can't wait for a signal to go to the cloud.
  • Privacy: Health data stays on the phone.

Quantization: To fit a model on a phone, we convert 32-bit floats to 8-bit integers.

  • This reduces size by 4x.
  • It slightly reduces accuracy.
  • Post-Training Quantization: Easy.
  • Quantization-Aware Training (QAT): Harder, but better accuracy.

4. Visualizing the Trade-off

graph TD
    Problem{Constraint?}
    
    Problem -->|Speed/Cost| Cloud{Cloud Compute}
    Problem -->|Offline/Privacy| Edge{Edge Device}
    
    Cloud -->|Versatility| GPU[NVIDIA GPU]
    Cloud -->|Max Throughput| TPU[Google TPU]
    
    Edge -->|Mobile| TFLite[TF Lite (Android/iOS)]
    Edge -->|Browser| TFJS[TensorFlow.js]
    Edge -->|Embedded| Coral[Coral TPU]

5. Summary

  • T4 is the king of Inference (Serving).
  • A100/TPU are the kings of Training.
  • Quantization shrinks models for Edge deployment but risks accuracy loss.

In the next lesson, we take our trained model and put it on the internet. Model Serving.


Knowledge Check

Error: Quiz options are missing or invalid.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn