
Google Cloud Professional Machine Learning Engineer – Certification Prep
Course Curriculum
18 modules designed to master the subject.
Module 1: Exam Orientation
Understand the exam structure, requirements, and how to prepare effectively.
Module 2: BigQuery ML and ML APIs
Build models with SQL using BigQuery ML and leverage pre-trained Vision, NLP, and Speech APIs.
BigQuery ML: Machine Learning with SQL
Why move data when you can bring the model to the data? Learn to build Classification, Regression, and Time-Series models directly within BigQuery using standard SQL.
BigQuery ML: Feature Engineering
How to preprocess data using SQL. Learn to use the TRANSFORM clause, ML.Bucketing, ML.Scaling, and One-Hot Encoding directly in BigQuery.
BigQuery ML: Predictions & Deployment
How to get answers. Using ML.PREDICT, ML.EXPLAIN_PREDICT, and exporting BQML models to Vertex AI for online serving.
Google Cloud ML APIs: AI Without Training
When to skip training altogether. A guide to the Vision, Natural Language, Translation, and Speech APIs. Learn the 'Pre-trained' strategic advantage.
Module 3: AutoML and Prebuilt ML
Train high-quality custom models with minimal code using Vertex AI AutoML.
AutoML: High Quality, Low Code
How to train custom models without writing training loops. We cover AutoML for Vision, Tables, and Text, and how to prepare your data for success.
AutoML: Evaluation & Debugging
Your AutoML model is trained. Is it good? interpreting Confusion Matrices, Precision/Recall curves, and Feature Importance to fix underperforming models.
Module 4: Data Exploration & Preparation
Clean, visualize, and engineer features using Dataflow, Dataprep, and Vertex AI.
Data Preparation at Scale: Dataflow & Vertex AI
Data is 80% of ML. Learn how to execute ETL pipelines using BigQuery and Dataflow, and how to manage features using Vertex AI Feature Store.
Data Transformation: Cleaning & TF Transform
Dataflow is the engine, but what logic goes inside? Learn the difference between Instance-Level vs Full-Pass transformations and how to use TensorFlow Transform (TFT) to prevent skew.
Vertex AI Feature Store: The Single Source of Truth
Stop duplicating feature engineering code. Learn how Feature Store unifies Online (Serving) and Offline (Training) feature access.
Module 5: Model Prototyping and Experimentation
Use Vertex AI Workbench for notebooks and Vertex AI Experiments to track runs.
Workbench: Jupyter on the Cloud
Why use Vertex AI Workbench? We cover Managed Notebooks vs User-Managed Notebooks, and how to choose the right one for your security and compute needs.
Development Environment: Scaling & Compute
Choosing the right hardware for development. When to use a local GPU vs a remote cluster, and how to define custom containers.
Source Control: Notebooks & Git
Notebooks are notoriously hard to version control. Learn patterns for nbdime, saving outputs, and refactoring to Python scripts.
Tracking Experiments: Vertex AI Experiments and Kubeflow
From messy notebooks to organized experiments. Learn how to use Vertex AI Experiments to log parameters and metrics, and how Kubeflow Pipelines can automate your experimentation process.
Module 6: Model Design and Architecture
Select the right model architecture, loss functions, and frameworks for the problem.
Model Architecture Design: Choosing the Right Brain
CNNs, RNNs, Transformers, or XGBoost? Learn how to map business problems to model architectures, and how to define success metrics.
Interpretability Deep Dive: Explainable AI
Understanding Feature Attributions, Integrated Gradients, and XRAI. How to satisfy regulatory constraints on 'Black Box' models.
Generative AI: Design Considerations
The new exam domain. When to use Model Garden, Vertex AI Agent Builder, and how to tune Foundation Models.
Module 7: Model Training
Train custom models at scale using Vertex AI Training, hyperparameter tuning, and distributed training.
Training Data Management: Strategies
How to feed the beast. GCS Bucket structure, Managed Datasets, and improving I/O performance.
Distributed Training: From One GPU to Thousands
How to break the memory limit. Learn about Data Parallelism, Model Parallelism, reduction servers, and how to use Vertex AI Custom Training jobs.
Hyperparameter Tuning: Finding the Magic Numbers
Stop guessing. Learn to use Vertex AI Vizier for Bayesian Optimization, and how to define your search space for efficient tuning.
Troubleshooting Training: Common Failures
Why did my job fail? Debugging OOM errors, NaN losses, and 'Permission Denied'.
Module 8: Training Hardware and Compute Options
Optimize cost and performance using GPUs, TPUs, and choosing the right machine types.
Compute Hardware: GPUs, TPUs, and Edge
Choosing the right silicon. When to pay for A100s, when to use TPUs, and how to quantize models for mobile deployment.
Distributed Architectures: Parameter Server vs All-Reduce
How GPUs talk to each other. Understanding Ring All-Reduce, PS Strategy, and when to use NCCL.
Module 9: Model Serving Fundamentals
Deploy models for online and batch prediction using Vertex AI Prediction.
Online vs Batch: Choosing the Pattern
The Architecture Decision. When to use HTTP prediction vs batch jobs, and how to handle cost/latency trade-offs.
Model Serving: Vertex AI Prediction
Batch vs. Online Prediction. How to deploy models to endpoints, manage versions, and optimize for latency.
Model Registry & Versioning Strategies
Managing the lifecycle. Aliasing, Tagging, and Rollback strategies using Vertex AI Model Registry.
Module 10: Scaling Online Serving
Optimize latency, throughput, and autoscaling for production endpoints.
Scaling & Optimization: Handling the Load
How to survive Black Friday. Learn about Autoscaling, GPU Inference, TF-TRT, and optimizing latency for high-throughput serving.
Hardware Selection for Serving
Choosing the right hardware for serving. When to use CPUs vs GPUs for online prediction.
Feature Store Integration at Serving Time
How to use the Vertex AI Feature Store for low-latency feature lookups at serving time.
Performance Tuning and Latency Optimization
How to make your model faster. A guide to performance tuning and latency optimization for online prediction.
A/B Testing and Model Staging
How to safely deploy new models to production. A guide to A/B testing and model staging using Vertex AI Prediction.
Module 11: End-to-End ML Pipelines
Orchestrate reproducible workflows using Vertex AI Pipelines and Kubeflow.
ML Pipeline Architectures: KFP, TFX, and Composer
The heart of MLOps. Learn how to design ML pipeline architectures using Kubeflow Pipelines (KFP), TensorFlow Extended (TFX), and Cloud Composer.
Validating Data and Models
How to ensure data quality and model performance across training and serving. A guide to TensorFlow Data Validation (TFDV) and TensorFlow Model Analysis (TFMA).
Pipeline Components and Triggers
How to break down your ML workflow into components and how to trigger your pipeline to run automatically.
Module 12: Automated Retraining and CI/CD
Implement MLOps with Cloud Build for continuous training and model delivery.
Defining Retraining Policies
When to retrain your model. A guide to defining retraining policies based on schedule, performance decay, and new data.
Integrating ML Pipelines with CI/CD Tools
How to automate your ML workflows using Cloud Build. A guide to integrating your ML pipelines with CI/CD tools.
Continuous Integration and Delivery for ML Models
How to safely and automatically deploy your models to production. A guide to continuous integration and delivery (CI/CD) for ML models.
Module 13: Metadata and Versioning
Track lineage and version models using Vertex AI Metadata and Model Registry.
Tracking and Comparing Datasets and Model Artifacts
How to track and compare datasets and model artifacts using Vertex AI ML Metadata.
Establishing Metadata Tracking and Lineage
How to establish metadata tracking and lineage for your ML workflows using Vertex AI ML Metadata.
Version Control for Artifacts and ML Assets
How to manage versions of your datasets, models, and other ML assets using the Vertex AI Model Registry and other tools.
Module 14: Responsible AI, Risk, and Explainability
Ensure fairness, interpretability (XAI), and security in ML systems.
Responsible AI: Security, Bias, and Fairness
How to build AI systems that are safe, fair, and transparent. A guide to responsible AI practices.
Model Readiness and Ethical Considerations
How to ensure that your model is ready for production and that it meets all your ethical requirements.
Explainable AI Methods on Vertex AI
How to use Vertex Explainable AI to understand your model's predictions. A guide to the different feature attribution methods available on Vertex AI.
Module 15: Monitoring Performance and Drift
Detect training-serving skew and data drift using Vertex AI Model Monitoring.
Establishing Metrics and Baseline Monitoring
How to establish metrics and baseline monitoring for your ML models using Vertex AI Model Monitoring.
Detecting Training-Serving Skew
How to detect and prevent training-serving skew. A guide to using TensorFlow Data Validation (TFDV) to compare your training and serving data.
Monitoring Feature Drift and Model Performance
How to monitor your model's performance over time and detect feature drift. A guide to using Vertex AI Model Monitoring.
Troubleshooting Common Errors in Training and Serving
How to troubleshoot common errors in training and serving. A guide to debugging your ML models.
Module 16: Cross-Cutting Concepts and Best Practices
Security, compliance, and architectural patterns for ML.
Security & Best Practices: The MLOps Fortress
VPC-SC, CMEK, Private Endpoints, and Custom Service Accounts. How to secure your ML infrastructure for the enterprise.
MLOps Fundamentals: Reproducibility, Automation, and Reliability
How to build and maintain a robust and reliable ML system. A guide to the key principles of MLOps.
Infrastructure Patterns for Scalable ML Systems
How to design and build scalable ML systems on Google Cloud. A guide to the most common infrastructure patterns.
Module 17: Exam Preparation Strategy
Review key domains and practice strategies for exam day.
Domain-by-Domain Review
A high-level review of the key concepts for each domain of the Google Cloud Professional Machine Learning Engineer exam.
Practice Question Patterns and Scenario Interpretation
How to deconstruct the exam questions. A guide to the most common question patterns and how to interpret the scenarios.
Time Management and Exam Tactics
How to make the most of your time on the exam. A guide to time management and exam tactics.
Checklist for Final Review
A checklist of the key concepts and topics to review before you take the exam.
Capstone Project
Design an end-to-end ML solution on Google Cloud.
Course Overview
Format
Self-paced reading
Duration
Approx 6-8 hours
Found this course useful? Support the creator to help keep it free for everyone.
Support the Creator