
MLOps Fundamentals: Reproducibility, Automation, and Reliability
How to build and maintain a robust and reliable ML system. A guide to the key principles of MLOps.
The Three Pillars of MLOps
MLOps is a set of practices that aims to build and maintain high-quality ML systems. The three pillars of MLOps are reproducibility, automation, and reliability.
1. Reproducibility
Reproducibility is the ability to recreate the same results given the same inputs. In the context of ML, this means that you should be able to:
- Reproduce a model: Given the same code, data, and hyperparameters, you should be able to train a model that has the exact same weights.
- Reproduce a prediction: Given the same model and the same input, you should be able to get the exact same prediction.
To achieve reproducibility, you should:
- Version your code, data, and models.
- Use a containerized environment to ensure that your code is always run in the same environment.
- Set a random seed for all your random operations.
2. Automation
Automation is the process of using tools to perform tasks that would otherwise be done manually. In the context of ML, this means that you should automate:
- Data preprocessing and feature engineering.
- Model training and evaluation.
- Model deployment and monitoring.
To achieve automation, you should:
- Use a pipeline orchestration tool like Vertex AI Pipelines or Cloud Composer.
- Use a CI/CD tool like Cloud Build to automate your build, test, and deployment processes.
3. Reliability
Reliability is the ability of a system to perform its intended function without failure. In the context of ML, this means that your system should be able to:
- Handle a large volume of requests without crashing.
- Recover from failures quickly and gracefully.
- Provide accurate and consistent predictions.
To achieve reliability, you should:
- Use a managed ML platform like Vertex AI.
- Use a monitoring tool like Vertex AI Model Monitoring to monitor your model's performance and detect any issues.
- Have a plan in place for rolling back to a previous version of your model if necessary.
Knowledge Check
?Knowledge Check
You are training a model to predict the price of a house. You want to be able to reproduce the exact same model every time you run your training script. What should you do?