Infrastructure Patterns for Scalable ML Systems
·ProfessionalEngineeringCertifications

Infrastructure Patterns for Scalable ML Systems

How to design and build scalable ML systems on Google Cloud. A guide to the most common infrastructure patterns.

From Laptop to Planet-Scale

As your ML system grows, you will need to adopt new infrastructure patterns to handle the increased scale and complexity. In this lesson, we'll cover some of the most common infrastructure patterns for scalable ML systems on Google Cloud.


1. The Serverless Pattern

The serverless pattern is a good choice for ML systems that have intermittent or unpredictable traffic. In this pattern, you use managed services like Cloud Functions, Cloud Run, and Vertex AI to build and deploy your ML system.

  • Pros:
    • No servers to manage.
    • Pay-per-use pricing.
    • Automatic scaling.
  • Cons:
    • Can be more expensive than other patterns for high-traffic systems.
    • Can have cold start issues.

2. The Kubernetes Pattern

The Kubernetes pattern is a good choice for ML systems that have high traffic or that require a lot of customization. In this pattern, you use Google Kubernetes Engine (GKE) to build and deploy your ML system.

  • Pros:
    • Highly scalable and reliable.
    • Portable to other clouds.
    • Can be more cost-effective than the serverless pattern for high-traffic systems.
  • Cons:
    • More complex to set up and manage than the serverless pattern.
    • Requires more expertise to operate.

3. The Hybrid Pattern

The hybrid pattern is a combination of the serverless and Kubernetes patterns. In this pattern, you use serverless services for some parts of your system and Kubernetes for other parts.

  • Example: You might use a Cloud Function to trigger your ML pipeline, and then use GKE to run the training and serving components of your pipeline.

4. The Dataflow Pattern

The Dataflow pattern is a good choice for ML systems that need to process large amounts of streaming data. In this pattern, you use Cloud Dataflow to build and deploy your data processing pipelines.

  • Pros:
    • Highly scalable and reliable.
    • Can process both batch and streaming data.
  • Cons:
    • Can be more expensive than other patterns for low-traffic systems.

Knowledge Check

?Knowledge Check

You are building an ML system that will have intermittent and unpredictable traffic. You want a solution that is easy to manage and that has pay-per-use pricing. Which infrastructure pattern is the best choice?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn