
Defining Retraining Policies
When to retrain your model. A guide to defining retraining policies based on schedule, performance decay, and new data.
When to Retrain?
A model's performance can degrade over time as the data it's seeing in production drifts away from the data it was trained on. To maintain a high-quality model, you need to have a strategy for retraining it on a regular basis.
1. Retraining Triggers
There are three main types of triggers for retraining your model:
- Scheduled Triggers: Retrain your model on a regular schedule (e.g., every day, every week). This is the simplest approach and is a good choice if your data distribution changes at a predictable rate.
- Performance-based Triggers: Retrain your model when its performance drops below a certain threshold. This is a more sophisticated approach that requires you to monitor your model's performance in production.
- Data-based Triggers: Retrain your model when new data becomes available. This is a good choice if your data is constantly changing and you want to keep your model as up-to-date as possible.
2. Defining a Retraining Policy
When defining a retraining policy, you should consider the following factors:
- The rate of data drift: How quickly does your data distribution change? If your data drifts quickly, you will need to retrain your model more frequently.
- The cost of retraining: Retraining a model can be expensive, so you need to balance the cost of retraining with the benefit of having a more accurate model.
- The importance of model freshness: How important is it for your model to be up-to-date? For some applications (e.g., fraud detection), it's crucial to have a model that is trained on the latest data.
Knowledge Check
?Knowledge Check
You are training a model to predict the price of a stock. The stock market is constantly changing, and you want to keep your model as up-to-date as possible. Which retraining trigger is the best choice?