Model Architecture Design: Choosing the Right Brain

Model Architecture Design: Choosing the Right Brain

CNNs, RNNs, Transformers, or XGBoost? Learn how to map business problems to model architectures, and how to define success metrics.

The Architect's Choice

Before you write code, you must choose the architecture. The exam will give you a scenario and ask: "Which algorithm usually performs best?"


1. The Algorithm Cheat Sheet

Data TypeProblemBest ArchitectureWhy?
TabularClassification/RegressionXGBoost / LightGBM (Trees)Handles unscaled features well; interpretable.
TabularVery Complex / MassiveDeep Neural Network (Wide & Deep)Can learn crossed features.
ImageObject DetectionCNN (ResNet, EfficientNet)Spatial invariance (a cat is a cat even if rotated).
TextTranslation / SummaryTransformer (BERT, T5, Gemini)Attention mechanism handles long-range context.
Time SeriesForecastingLSTM / ARIMA+Remembers history (sequence dependence).
RecommendationUser PreferenceTwo-Tower Model / Matrix FactorizationHandles sparse data (users usually only buy 1% of items).

2. Transfer Learning

Exam Rule: Never train from scratch if you can Fine-Tune.

  • Training from Scratch: Requires millions of images. Takes weeks.
  • Transfer Learning: Take a Google model (trained on ImageNet), freeze the early layers, and just retrain the last layer on your images. Requires ~100 images. Takes minutes.

Vertex AI Model Garden is the library where you find these base models.


3. Designing for Interpretability

Sometimes accuracy isn't the goal. In Banking and Health, Explainability is the law.

  • White Box Models: Linear Regression, Decision Trees. (Easy to explain: "You were denied because Age < 18").
  • Black Box Models: Deep Neural Networks. (Hard to explain).

If the exam scenario says "Must be fully interpretable by non-technical auditors," avoid Deep Learning. Use BigQuery ML Linear Regression or Boosted Trees.


4. Loss Functions

You must tell the model what "Success" looks like.

ProblemLoss FunctionMetric
Binary ClassificationBinary Cross-Entropy (Log Loss)Accuracy, AUC, Precision/Recall
Multi-ClassCategorical Cross-EntropyF1-Score
RegressionMSE (Mean Squared Error)RMSE, MAE
Outlier-Heavy RegressionHuber Loss / MAEMAE is robust to outliers; MSE punished them too hard.

5. Visualizing the "Wide & Deep" Model

This is a specific Google architecture often tested. It combines memorization (Wide) with generalization (Deep).

graph TD
    Input[Input Features]
    
    subgraph "Wide (Memorization)"
    Linear[Linear Model]
    end
    
    subgraph "Deep (Generalization)"
    Dense1[Dense Layer] --> Dense2[Dense Layer]
    end
    
    Input --> Linear
    Input --> Dense1
    
    Linear --> Add[Combine]
    Dense2 --> Add
    Add --> Output[Sigmoid Output]

6. Summary

  • Tabular: Trees (XGBoost).
  • Perceptual (Vision/Text): Deep Learning (CNN/Transformers).
  • Scarcity: Use Transfer Learning.
  • Regulation: Use Interpretable Models.

In the next lesson, we scale up. How do we train these models on Terabytes of data? Distributed Training.


Knowledge Check

?Knowledge Check

You are predicting house prices (Regression). Your dataset contains a few mansions that cost $100M, while most houses cost $500k. You notice your model is obsessing over getting the mansion prices right, ruining its accuracy for normal houses. Which Loss Function helps fix this?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn