
Pipeline Components and Triggers
How to break down your ML workflow into components and how to trigger your pipeline to run automatically.
From Scripts to Components
A well-designed ML pipeline is made up of a series of modular, reusable components. Each component should perform a single, well-defined task. This makes your pipeline easier to build, test, and maintain.
1. Identifying Pipeline Components
When designing a pipeline, you should start by breaking down your ML workflow into a series of steps. Each step should be a candidate for a pipeline component.
Some common pipeline components include:
- Data Ingestion: Reading data from a source (e.g., BigQuery, Cloud Storage).
- Data Validation: Checking the quality of your data using TFDV.
- Data Preprocessing: Transforming your data into a format that can be used for training.
- Model Training: Training your model.
- Model Analysis: Evaluating your model's performance using TFMA.
- Model Deployment: Deploying your model to an endpoint for serving.
2. Pipeline Triggers
Once you have a pipeline, you need to decide how to trigger it to run. There are two main types of triggers:
- Manual Triggers: You can manually trigger a pipeline to run from the Google Cloud Console or using the
gcloudcommand-line tool. - Automated Triggers: You can set up automated triggers to run your pipeline in response to certain events.
Common Automated Triggers
- Scheduled Triggers: Run your pipeline on a regular schedule (e.g., every day, every week).
- Event-based Triggers: Run your pipeline in response to an event, such as:
- New data: A new file is uploaded to a Cloud Storage bucket.
- New code: A new commit is pushed to a Git repository.
- New model: A new model is registered in the Vertex AI Model Registry.
You can use Cloud Functions or Cloud Pub/Sub to create event-based triggers.
Knowledge Check
?Knowledge Check
You want to automatically retrain your model whenever new data is uploaded to a Cloud Storage bucket. What is the best way to do this?