
The Lifecycle Phases: Training vs. Inference
From learning to predicting. Understand the distinct phases of the machine learning lifecycle and their impact on cost and performance.
The Two Lives of a Model
In the SageMaker world, a model has two distinct stages of life.
- The Learning Stage (Training).
- The Working Stage (Inference).
On the AWS Certified AI Practitioner exam, you will be asked about "Architecting for Cost" or "Choosing Instances" based on these two phases. If you try to use a "Training" setup for "Inference," you will bankrupt your company!
1. Phase 1: Training (The Learning Phase)
This is where the algorithm looks at the data (Ground Truth) and adjusts its "Weights" until it understands the patterns.
- Compute: Extremely intensive. This requires GPUs (Nvidia chips) or AWS Trainium chips.
- Duration: Can take minutes, hours, or even weeks depending on the data size.
- Cost Structure: You want to use Ephemeral Compute—servers that turn on, do the work, and turn off immediately.
- SageMaker Tool: SageMaker Training Jobs.
Exam Strategy: If you hear "Finding patterns," "Updating weights," or "Learning from a dataset," the answer is Training.
2. Phase 2: Inference (The Working Phase)
This is where the "Trained Model" is used to predict results from New Data.
- Compute: Less intensive than training, but needs to be Low Latency (Fast). Use AWS Inferentia chips for high-scale, low-cost inference.
- Duration: Happens in milliseconds.
- Cost Structure: You can use Real-time Endpoints (always on) or Serverless Inference (pay per use).
- SageMaker Tool: SageMaker Endpoints.
Exam Strategy: If you hear "Making a prediction," "Analyzing a new photo," or "Generating a response," the answer is Inference.
3. The "Deployment" Bridge
Deployment is the process of taking the Model Artifact (the final file created during training) and "Hosting" it on a server so the world can use it via an API call.
Deployment Options in SageMaker:
- Real-time Inference: Best for low-latency requirements (e.g., a credit card fraud check).
- Batch Transform: Best for processing millions of records at once when you don't need the answer immediately (e.g., analyzing all sales logs from the previous month).
- Asynchronous Inference: Best for large requests (like a 5-minute video) that take a while to process.
4. Visualizing the Split
| Feature | Training | Inference |
|---|---|---|
| Input | Training Dataset (Large) | New Request (Single/Small) |
| Output | A trained Model File | A Prediction / Result |
| Hardware | GPUs (P3/P4 instances) | CPUs or Inferentia (G5/Inf2) |
| Frequency | Once (or periodically) | Thousands of times a second |
graph LR
A[Data in S3] -->|TRAINING JOB| B[GPU Cluster]
B -->|Generates| C[MODEL ARTIFACT]
C -->|DEPLOYMENT| D[Inference Endpoint]
E[User Request] --> D
D -->|RESULT| F[Prediction: '98% Spam']
5. Summary: Right-Sizing the Lifecycle
The most important takeaway for a Practitioner is: Training is about Throughput; Inference is about Latency.
- In Training, we want to chew through as much data as possible as fast as possible.
- In Inference, we want to give the human an answer as fast as possible.
Exercise: Identify the Phase
A medical imaging company has already built a model that can detect pneumonia in X-rays. They are now installing the model in 50 hospitals so that when a doctor uploads a new X-ray, the AI gives them a "Risk Score" in under 2 seconds. Which phase are they currently in?
- A. Training.
- B. Data Preparation.
- C. Inference.
- D. Evaluation.
The Answer is C! They are using the model to make Predictions (Risk Scores) on New Data (the new X-ray). This is the Inference phase.
Knowledge Check
?Knowledge Check
In the ML lifecycle, what is the 'Training' phase?
What's Next?
We know how it works. But do we need it? In the next lesson, we look at the strategic decision: When is SageMaker appropriate?