Pipeline Components and Triggers
·ProfessionalEngineeringCertifications

Pipeline Components and Triggers

How to break down your ML workflow into components and how to trigger your pipeline to run automatically.

From Scripts to Components

A well-designed ML pipeline is made up of a series of modular, reusable components. Each component should perform a single, well-defined task. This makes your pipeline easier to build, test, and maintain.


1. Identifying Pipeline Components

When designing a pipeline, you should start by breaking down your ML workflow into a series of steps. Each step should be a candidate for a pipeline component.

Some common pipeline components include:

  • Data Ingestion: Reading data from a source (e.g., BigQuery, Cloud Storage).
  • Data Validation: Checking the quality of your data using TFDV.
  • Data Preprocessing: Transforming your data into a format that can be used for training.
  • Model Training: Training your model.
  • Model Analysis: Evaluating your model's performance using TFMA.
  • Model Deployment: Deploying your model to an endpoint for serving.

2. Pipeline Triggers

Once you have a pipeline, you need to decide how to trigger it to run. There are two main types of triggers:

  • Manual Triggers: You can manually trigger a pipeline to run from the Google Cloud Console or using the gcloud command-line tool.
  • Automated Triggers: You can set up automated triggers to run your pipeline in response to certain events.

Common Automated Triggers

  • Scheduled Triggers: Run your pipeline on a regular schedule (e.g., every day, every week).
  • Event-based Triggers: Run your pipeline in response to an event, such as:
    • New data: A new file is uploaded to a Cloud Storage bucket.
    • New code: A new commit is pushed to a Git repository.
    • New model: A new model is registered in the Vertex AI Model Registry.

You can use Cloud Functions or Cloud Pub/Sub to create event-based triggers.


Knowledge Check

?Knowledge Check

You want to automatically retrain your model whenever new data is uploaded to a Cloud Storage bucket. What is the best way to do this?

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn